Modelos de predicción¶

Construcción de modelos para forecast de la temperatura superficial del aire en Brasil - station A601 en el sudeste

Este código prepara el ambiente de trabajo para realizar análisis de datos, visualizaciones y pruebas estadísticas con la serie de tiempo de temperatura en una estación meteorológica del sudoeste de Brasil.

Las librerías cargadas son necesarias para tareas como manipulación de datos, creación de gráficos, modelado estadístico y diagnóstico de modelos. Las configuraciones adicionales aseguran que el código se ejecute sin advertencias molestas y con un estilo visual consistente en los gráficos.

In [48]:
## Se cargan las liberías correspondientes

import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings("ignore")
sns.set_style("darkgrid")
import statsmodels.api as sm
from statsmodels.stats.diagnostic import acorr_ljungbox
from IPython.display import display
from statsmodels.sandbox.stats.runs import runstest_1samp

Este código prepara dos estructuras de datos vacías (ResTableVal y ResTableTest) para almacenar información relacionada con modelos, como sus nombres y los valores de residuos o resultados.

Estos DataFrames pueden usarse posteriormente para registrar y analizar los resultados de validación y prueba de diferentes modelos.

In [49]:
# Create an empty DataFrame for Validation and Test
columns = ["ModelName", "ResValues"]
ResTableVal = pd.DataFrame(columns=columns)
ResTableTest = pd.DataFrame(columns=columns)

Se define la función "metricas" para evaluar el rendimiento de un modelo de predicción. Calcula métricas MAE, MSE, RMSE, MAPE y SSE, además de realizar la prueba de Ljung-Box para verificar la autocorrelación en los residuos.

La función retorna un daframe con los valores calculados

In [50]:
### Función para calcular las métricas de los modelos

from sklearn.metrics import mean_absolute_error, mean_squared_error, mean_absolute_percentage_error

def metricas(modelo, test, y_pred):
    # Calcular el p-valor de Ljung-Box
    residuals = test - y_pred
    max_lag = min(20, len(residuals) - 1)

    ljung_box_results = acorr_ljungbox(residuals, lags=[max_lag], return_df=True)
    p_value = ljung_box_results["lb_pvalue"].values[0]

    return pd.DataFrame({
        "Modelo": [modelo],
        "MAE": [mean_absolute_error(test, y_pred)],
        "MSE": [mean_squared_error(test, y_pred)],
        "RMSE": [np.sqrt(mean_squared_error(test, y_pred))],
        "MAPE": [mean_absolute_percentage_error(test, y_pred) * 100],
        "SSE": [mean_squared_error(test, y_pred) * len(test)],
        "LjB p-value": [p_value]
    }, columns=["Modelo", "MAE", "MSE", "RMSE", "MAPE", "SSE", "LjB p-value"])

Este código carga un archivo CSV con datos de temperatura y 25 variables meteorológicas adicionales.

Realiza una limpieza y transformación de los datos (como la conversión de tipos y la eliminación de valores faltantes), y filtra el conjunto de datos para conservar solo las mediciones de temperatura dentro de un rango válido.

El resultado es un DataFrame limpio y listo para su análisis, que contiene únicamente la fecha-hora (Hora-Data) y la temperatura.

In [51]:
## Se cargan y limpian los datos

df = (
    pd.read_csv('/content/soudeste_a601.csv')
    .rename(columns={
        'Data': 'Data',
        'Hora': 'Hora',
        'TEMPERATURA DO AR - BULBO SECO, HORARIA (°C)': 'Temperature'
    })
    .assign(
        Data=lambda df: pd.to_datetime(df['Data'], format='%Y-%m-%d', errors='coerce'),
        Temperature=lambda df: pd.to_numeric(df['Temperature'], errors='coerce'),
        Data_Hora=lambda df: pd.to_datetime(df['Data'].astype(str) + ' ' + df['Hora'], format='%Y-%m-%d %H:%M', errors='coerce')
    )
    .dropna(subset=['Data', 'Hora', 'Temperature'])
)

## Se selecciona únicamente la el tiempo y la variable

data = (df.query("-50 <= Temperature <= 50")
       .filter(items=["Data_Hora", "Temperature"])
       )

La función detect_outliers_iqr se utiliza para identificar valores atípicos en un conjunto de datos utilizando el rango intercuartílico (IQR).

Devuelve una un valor booleano (1:Outlier, 0:No outlier) que indica qué valores son considerados outliers según los límites definidos por el IQR.

In [52]:
def detect_outliers_iqr(data, threshold=1.5):
  Q1 = data.quantile(0.25)
  Q3 = data.quantile(0.75)
  IQR = Q3 - Q1
  lower_bound = Q1 - threshold * IQR
  upper_bound = Q3 + threshold * IQR
  outliers = (data < lower_bound) | (data > upper_bound)
  return outliers

Este código detecta valores atípicos (outliers) en la columna 'Temperature' del DataFrame data utilizando la función detect_outliers_iqr.

Luego, cuenta y muestra la cantidad de outliers encontrados, es decir, el número de valores True en la máscara booleana generada.

Finalmente, imprime el resultado con el mensaje: "Number of outliers: X", donde X es la cantidad de outliers.

In [53]:
# prompt: detect_outliers_iqr(data['Temperature']) y suma la cantidad de datos True que hayan

outliers = detect_outliers_iqr(data['Temperature'])
outlier_count = outliers.sum()
print(f"Number of outliers: {outlier_count}")
Number of outliers: 1721

2. Preparación de los datos¶

La suavización exponencial simple pondera los datos históricos de forma exponencial, dando más peso a los datos más recientes y menos peso a los datos más antiguos. Esto se hace mediante un factor de suavización ($λ$), que es un valor entre 0 y 1. Cuanto más cerca esté alfa de 1, más peso se dará a los datos recientes.

Se convierten los datos data a un objeto Panda series y se pone como índice la variable que contiene la fecha y hora, (Data_Hora)

In [54]:
## Se convierten los datos _data_ a un objeto time serie

data_ts = pd.Series(data=data['Temperature'].values, index=data['Data_Hora'])

data_ts.head()
data_ori = data_ts

3. Autocorrelación¶

Este código genera dos gráficos: uno de autocorrelación y otro de autocorrelación parcial para una serie temporal (data_ts).

Utiliza plot_acf y plot_pacf de la librería statsmodels para visualizar la dependencia temporal en los datos, con un rango de 48 lags.

Los gráficos se muestran en una figura con dos subplots, titulados "Autocorrelación" y "Autocorrelación Parcial", respectivamente.

Finalmente, se ajusta el diseño y se muestra la figura con plt.show().

In [55]:
import matplotlib.pyplot as plt
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf

fig, axes = plt.subplots(1, 2, figsize=(16, 6))

# Autocorrelation on the left
plot_acf(data_ts, lags=48, ax=axes[0])
axes[0].set_title('Autocorrelación')

# Partial Autocorrelation on the right
plot_pacf(data_ts, lags=48, ax=axes[1])
axes[1].set_title('Autocorrelación Parcial')

plt.tight_layout()
plt.show()

4. Serie de datos original¶

Este código genera un gráfico de línea de la serie temporal data_ts, que representa la temperatura superficial en la estación Soudeste Brasil A601 desde mayo de 2000 hasta abril de 2021.

Además, se etiquetan los ejes: el eje X representa el tiempo en frecuencia horaria, y el eje Y representa la temperatura en grados Celsius.

In [56]:
plt.figure(figsize=(15, 5))
data_ts.plot(color='b')
plt.title('Temperatura superficial - Soudeste Brasil A601 (Mayo 2000 - Abril 2021)')
plt.xlabel('Tiempo - Frecuencia horaria')
plt.ylabel('Temperature - Cº');

5. Modelo de Suavización Exponencial Simple (SES)¶

A continuación el código Python para generar el modelo SES

Este código implementa un modelo de suavización exponencial simple (SES) de primer orden para realizar pronósticos en una serie temporal de temperatura.

Se basa en una función de suavización tomada de las que se realizaron en clase.

Divide los datos en conjuntos de entrenamiento, validación y prueba, y realiza pronósticos en ventanas móviles (rolling forecasts).

Evalúa el rendimiento del modelo mediante métricas como MAPE, MAE, RMSE y MSE, y realiza pruebas estadísticas (Ljung-Box, Wald-Wolfowitz y Kolmogorov-Smirnov) para analizar los residuos.

Finalmente, genera gráficos de los pronósticos (incluyendo el resultado de las pruebas de normalidad y aleatoriedad), residuos y análisis de autocorrelación, y almacena los residuos en tablas para su posterior uso.

Los resultados de las métricas y la interpretación de los resultados de normalidad y aleatoriedad se presentan en tablas independientes.

In [57]:
# JDF-2025Mar09 - SES with Rolling Improvement for Validation and Test Sets
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.metrics import mean_absolute_error, mean_squared_error
from statsmodels.stats.diagnostic import acorr_ljungbox
from statsmodels.graphics.tsaplots import plot_acf
import scipy.stats as stats
from statsmodels.sandbox.stats.runs import runstest_1samp  # For Wald-Wolfowitz test

# First-order exponential smoothing function
def firstsmooth(y, lambda_, start=None):
    """
    First-order exponential smoothing.

    Parameters:
        y (pd.Series or np.array): Input time series.
        lambda_ (float): Smoothing parameter (0 < lambda_ < 1).
        start (float): Initial value for smoothing. If None, uses y[0].

    Returns:
        pd.Series: Smoothed values.
    """
    ytilde = y.copy()
    if start is None:
        start = y[0]
    ytilde[0] = lambda_ * y[0] + (1 - lambda_) * start
    for i in range(1, len(y)):
        ytilde[i] = lambda_ * y[i] + (1 - lambda_) * ytilde[i - 1]
    return ytilde

# Function to forecast using first-order exponential smoothing
def forecast_firstsmooth(train, steps, lambda_):
    """
    Forecast future values using first-order exponential smoothing.

    Parameters:
        train (pd.Series): Training data.
        steps (int): Number of steps to forecast.
        lambda_ (float): Smoothing parameter (0 < lambda_ < 1).

    Returns:
        pd.Series: Forecasted values.
    """
    # Fit the model on the training data
    smoothed_train = firstsmooth(train, lambda_)

    # The last smoothed value is the starting point for forecasting
    last_smoothed = smoothed_train.iloc[-1]

    # Forecast future values
    forecast = np.zeros(steps)
    forecast[0] = lambda_ * train.iloc[-1] + (1 - lambda_) * last_smoothed
    for i in range(1, steps):
        forecast[i] = lambda_ * forecast[i - 1] + (1 - lambda_) * forecast[i - 1]

    # Convert forecast to a pandas Series with appropriate index
    forecast_index = pd.date_range(start=train.index[-1], periods=steps + 1, freq='H')[1:]
    return pd.Series(forecast, index=forecast_index)

nHours = 24
nDaysYear = 365
NoYears = 3
backHorizon = nHours * nDaysYear * NoYears
data_ts = data_ori
data_ts = data_ts[-backHorizon:]

# 1. Split the series into training, validation, and test sets
test_size = 24
val_size = 24

# Test: Last 24 values
test = data_ts.iloc[-test_size:]

# Validation: 24 values before test
validation = data_ts.iloc[-(test_size + val_size):-test_size]

# Training: All other records
train = data_ts.iloc[:-(test_size + val_size)]

# 2. Rolling Forecast with a 4-hour window
lambda_ = 0.5  # Smoothing parameter
window_size = 1  # Rolling window size in hours

# Initialize lists to store rolling forecasts and actuals for Validation and Test
val_rolling_forecasts = []
val_rolling_actuals = []
test_rolling_forecasts = []
test_rolling_actuals = []

# Perform rolling forecasts for Validation set
for i in range(0, len(validation), window_size):
    # Update training data with the most recent window
    current_train = pd.concat([train, validation.iloc[:i]])

    # Forecast for the next window
    current_forecast = forecast_firstsmooth(current_train, steps=window_size, lambda_=lambda_)

    # Append forecasts and actuals for Validation
    val_rolling_forecasts.extend(current_forecast.values)
    val_rolling_actuals.extend(validation.iloc[i:i + window_size].values)

# Convert lists to pandas Series for Validation
val_rolling_forecasts = pd.Series(val_rolling_forecasts, index=validation.index[:len(val_rolling_forecasts)])
val_rolling_actuals = pd.Series(val_rolling_actuals, index=validation.index[:len(val_rolling_actuals)])

# Perform rolling forecasts for Test set
combined_train_val = pd.concat([train, validation])  # Combine training and validation data
for i in range(0, len(test), window_size):
    # Update training data with the most recent window
    current_train = pd.concat([combined_train_val, test.iloc[:i]])

    # Forecast for the next window
    current_forecast = forecast_firstsmooth(current_train, steps=window_size, lambda_=lambda_)

    # Append forecasts and actuals for Test
    test_rolling_forecasts.extend(current_forecast.values)
    test_rolling_actuals.extend(test.iloc[i:i + window_size].values)

# Convert lists to pandas Series for Test
test_rolling_forecasts = pd.Series(test_rolling_forecasts, index=test.index[:len(test_rolling_forecasts)])
test_rolling_actuals = pd.Series(test_rolling_actuals, index=test.index[:len(test_rolling_actuals)])

# 3. Calculate metrics for rolling forecasts
def calculate_metrics(actual, forecast):
    """
    Calculate MAPE, MAE, RMSE, and MSE.
    """
    actual = actual.replace(0, 1e-9)  # Avoid division by zero
    metrics = {
        'MAPE': np.mean(np.abs((actual - forecast) / actual)) * 100,
        'MAE': mean_absolute_error(actual, forecast),
        'RMSE': np.sqrt(mean_squared_error(actual, forecast)),
        'MSE': mean_squared_error(actual, forecast),
    }
    return metrics

# Rolling forecast metrics for Validation
val_rolling_metrics = calculate_metrics(val_rolling_actuals, val_rolling_forecasts)
val_rolling_residuals = val_rolling_actuals - val_rolling_forecasts

# Rolling forecast metrics for Test
test_rolling_metrics = calculate_metrics(test_rolling_actuals, test_rolling_forecasts)
test_rolling_residuals = test_rolling_actuals - test_rolling_forecasts

# Calculate Ljung-Box p-value for rolling residuals
val_rolling_ljungbox = acorr_ljungbox(val_rolling_residuals, lags=10, return_df=True)
val_rolling_metrics['Ljung-Box p-value'] = val_rolling_ljungbox['lb_pvalue'].values[0]

test_rolling_ljungbox = acorr_ljungbox(test_rolling_residuals, lags=10, return_df=True)
test_rolling_metrics['Ljung-Box p-value'] = test_rolling_ljungbox['lb_pvalue'].values[0]

# Perform Wald-Wolfowitz Randomness Test
val_runs_stat, val_runs_pvalue = runstest_1samp(val_rolling_residuals)
test_runs_stat, test_runs_pvalue = runstest_1samp(test_rolling_residuals)

# Add randomness test results to metrics
val_rolling_metrics['Randomness p-value'] = val_runs_pvalue
test_rolling_metrics['Randomness p-value'] = test_runs_pvalue

# Perform Kolmogorov-Smirnov Normality Test
val_ks_stat, val_ks_pvalue = stats.kstest(val_rolling_residuals, 'norm')
test_ks_stat, test_ks_pvalue = stats.kstest(test_rolling_residuals, 'norm')

# Add normality test results to metrics
val_rolling_metrics['KS-Normality p-value'] = val_ks_pvalue
test_rolling_metrics['KS-Normality p-value'] = test_ks_pvalue

# Create a DataFrame for metrics
metrics_table = pd.DataFrame([val_rolling_metrics, test_rolling_metrics], index=['Validation', 'Test'])

# Create a separate table for interpretation
interpretation_table = pd.DataFrame({
    'Randomness': ['Random' if val_runs_pvalue > 0.05 else 'Not Random', 'Random' if test_runs_pvalue > 0.05 else 'Not Random'],
    'KS-Normality': ['Normal' if val_ks_pvalue > 0.05 else 'Not Normal', 'Normal' if test_ks_pvalue > 0.05 else 'Not Normal']
}, index=['Validation', 'Test'])

# Print the tables
print("Metrics Table:")
display(metrics_table.round(4))  # Use display() to render the DataFrame

print("\nTest Interpretation Table:")
display(interpretation_table.round(4))  # Use display() to render the DataFrame

# 4. Plot Training, Validation, Test, and Rolling Forecasts
plt.figure(figsize=(14, 6))
plt.plot(train.iloc[-48:].index, train.iloc[-48:], label='Training (Last 48)', color='blue')
plt.plot(validation.index, validation, label='Validation', color='green')
plt.plot(test.index, test, label='Test', color='red')
plt.plot(val_rolling_forecasts.index, val_rolling_forecasts, label='Validation Rolling Forecast', color='orange', linestyle='--')
plt.plot(test_rolling_forecasts.index, test_rolling_forecasts, label='Test Rolling Forecast', color='purple', linestyle='--')
plt.title(f'Rolling First-Order Exponential Smoothing Forecast (Lambda={lambda_})\nValidation MAE: {val_rolling_metrics["MAE"]:.2f}, Test MAE: {test_rolling_metrics["MAE"]:.2f}')
plt.xlabel('Data_Hora')
plt.ylabel('Temperature (°C)')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

# 5. Residual Analysis: Validation and Test (2x2 Grid)
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# Quadrant 0,0: Residuals vs Time (Validation)
axes[0, 0].plot(val_rolling_residuals.index, val_rolling_residuals, label='Validation Residuals', color='blue')
axes[0, 0].axhline(0, color='black', linestyle='--')
axes[0, 0].set_title(f'Validation Residuals vs Time\nRandomness: {"Random" if val_runs_pvalue > 0.05 else "Not Random"} ({val_runs_pvalue:.4f})')
axes[0, 0].set_xlabel('Data_Hora')
axes[0, 0].set_ylabel('Residuals')
axes[0, 0].legend()
axes[0, 0].grid(True, alpha=0.3)

# Quadrant 0,1: Residuals Density and Histogram (Validation)
val_rolling_residuals.plot(kind='hist', bins=20, density=True, alpha=0.5, label='Validation Residuals Histogram', color='blue', ax=axes[0, 1])
val_rolling_residuals.plot(kind='kde', label='Validation Residuals Density', color='blue', linestyle='--', ax=axes[0, 1])
axes[0, 1].set_title('Validation Residuals Histogram and Density')
axes[0, 1].set_xlabel('Residuals')
axes[0, 1].set_ylabel('Density')
axes[0, 1].legend()
axes[0, 1].grid(True, alpha=0.3)

# Quadrant 1,0: Q-Q Plot (Validation)
stats.probplot(val_rolling_residuals, dist="norm", plot=axes[1, 0])
axes[1, 0].set_title(f'Validation Residuals Q-Q Plot\nKS-Normality: {"Normal" if val_ks_pvalue > 0.05 else "Not Normal"} ({val_ks_pvalue:.4f})')
axes[1, 0].grid(True, alpha=0.3)

# Quadrant 1,1: ACF (Validation)
plot_acf(val_rolling_residuals, lags=20, ax=axes[1, 1], title='ACF of Validation Residuals', color='blue')
axes[1, 1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# 6. Residual Analysis: Test (2x2 Grid)
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# Quadrant 0,0: Residuals vs Time (Test)
axes[0, 0].plot(test_rolling_residuals.index, test_rolling_residuals, label='Test Residuals', color='red')
axes[0, 0].axhline(0, color='black', linestyle='--')
axes[0, 0].set_title(f'Test Residuals vs Time\nRandomness: {"Random" if test_runs_pvalue > 0.05 else "Not Random"} ({test_runs_pvalue:.4f})')
axes[0, 0].set_xlabel('Data_Hora')
axes[0, 0].set_ylabel('Residuals')
axes[0, 0].legend()
axes[0, 0].grid(True, alpha=0.3)

# Quadrant 0,1: Residuals Density and Histogram (Test)
test_rolling_residuals.plot(kind='hist', bins=20, density=True, alpha=0.5, label='Test Residuals Histogram', color='red', ax=axes[0, 1])
test_rolling_residuals.plot(kind='kde', label='Test Residuals Density', color='red', linestyle='--', ax=axes[0, 1])
axes[0, 1].set_title('Test Residuals Histogram and Density')
axes[0, 1].set_xlabel('Residuals')
axes[0, 1].set_ylabel('Density')
axes[0, 1].legend()
axes[0, 1].grid(True, alpha=0.3)

# Quadrant 1,0: Q-Q Plot (Test)
stats.probplot(test_rolling_residuals, dist="norm", plot=axes[1, 0])
axes[1, 0].set_title(f'Test Residuals Q-Q Plot\nKS-Normality: {"Normal" if test_ks_pvalue > 0.05 else "Not Normal"} ({test_ks_pvalue:.4f})')
axes[1, 0].grid(True, alpha=0.3)

# Quadrant 1,1: ACF (Test)
plot_acf(test_rolling_residuals, lags=20, ax=axes[1, 1], title='ACF of Test Residuals', color='red')
axes[1, 1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Add the model: SES
tempSeries = pd.Series(val_rolling_residuals)
ResTableVal.loc[0] = {"ModelName": "SES", "ResValues": tempSeries.values}
tempSeries = pd.Series(test_rolling_residuals)
ResTableTest.loc[0] = {"ModelName": "SES", "ResValues": tempSeries.values}
Metrics Table:
MAPE MAE RMSE MSE Ljung-Box p-value Randomness p-value KS-Normality p-value
Validation 3.9233 0.8110 1.0493 1.1010 0.0006 0.0080 0.1180
Test 4.1570 0.8646 1.1998 1.4395 0.0001 0.1734 0.6289
Test Interpretation Table:
Randomness KS-Normality
Validation Not Random Normal
Test Random Normal

7. Modelo de Suavización Exponencial Doble (DES)¶

A continuación el código Python para generar el modelo DES¶

Este código implementa un modelo de suavización exponencial de segundo orden (DES) para realizar pronósticos en una serie temporal de temperatura.

Se basa en una función de suavización tomada de las que se realizaron en clase que para la suavización de segundo orden se aplica recursivamente.

Divide los datos en conjuntos de entrenamiento, validación y prueba, y realiza pronósticos en ventanas móviles (rolling forecasts).

Evalúa el rendimiento del modelo mediante métricas como MAPE, MAE, RMSE y MSE, y realiza pruebas estadísticas (Ljung-Box, Wald-Wolfowitz y Kolmogorov-Smirnov) para analizar los residuos.

Finalmente, genera gráficos de los pronósticos (incluyendo el resultado de las pruebas de normalidad y aleatoriedad), residuos y análisis de autocorrelación, y almacena los residuos en tablas para su posterior uso.

Los resultados de las métricas y la interpretación de los resultados de normalidad y aleatoriedad se presentan en tablas independientes.

In [58]:
# JDF-2025Mar09 - DES with Rolling Improvement for Validation and Test Sets
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats
from statsmodels.stats.diagnostic import acorr_ljungbox
from statsmodels.graphics.tsaplots import plot_acf
from sklearn.metrics import mean_absolute_error, mean_squared_error

# Second-order exponential smoothing function
def secondsmooth(y, alpha, beta, start_level=None, start_trend=None):
    """
    Second-order exponential smoothing (Holt's linear trend method).

    Parameters:
        y (pd.Series or np.array): Input time series.
        alpha (float): Smoothing parameter for level (0 < alpha < 1).
        beta (float): Smoothing parameter for trend (0 < beta < 1).
        start_level (float): Initial level value. If None, uses y[0].
        start_trend (float): Initial trend value. If None, uses y[1] - y[0].

    Returns:
        pd.Series: Smoothed values.
    """
    ytilde = np.zeros_like(y)
    if start_level is None:
        start_level = y[0]
    if start_trend is None:
        start_trend = y[1] - y[0]

    level = start_level
    trend = start_trend

    for i in range(len(y)):
        ytilde[i] = level + trend
        if i < len(y) - 1:
            new_level = alpha * y[i] + (1 - alpha) * (level + trend)
            new_trend = beta * (new_level - level) + (1 - beta) * trend
            level, trend = new_level, new_trend

    return pd.Series(ytilde, index=y.index)

# Function to forecast using second-order exponential smoothing
def forecast_secondsmooth(train, steps, alpha, beta):
    """
    Forecast future values using second-order exponential smoothing.

    Parameters:
        train (pd.Series): Training data.
        steps (int): Number of steps to forecast.
        alpha (float): Smoothing parameter for level (0 < alpha < 1).
        beta (float): Smoothing parameter for trend (0 < beta < 1).

    Returns:
        pd.Series: Forecasted values.
    """
    # Fit the model on the training data
    smoothed_train = secondsmooth(train, alpha, beta)

    # Extract the last level and trend
    last_level = alpha * train.iloc[-1] + (1 - alpha) * (smoothed_train.iloc[-2] + (smoothed_train.iloc[-1] - smoothed_train.iloc[-2]))
    last_trend = beta * (last_level - smoothed_train.iloc[-2]) + (1 - beta) * (smoothed_train.iloc[-1] - smoothed_train.iloc[-2])

    # Forecast future values
    forecast = np.zeros(steps)
    forecast[0] = last_level + last_trend
    for i in range(1, steps):
        forecast[i] = forecast[i - 1] + last_trend

    # Convert forecast to a pandas Series with appropriate index
    forecast_index = pd.date_range(start=train.index[-1], periods=steps + 1, freq='H')[1:]
    return pd.Series(forecast, index=forecast_index)

nHours = 24
nDaysYear = 365
NoYears = 3
backHorizon = nHours * nDaysYear * NoYears
data_ts = data_ori
data_ts = data_ts[-backHorizon:]

# 1. Split the series into training, validation, and test sets
test_size = 24
val_size = 24

# Test: Last 24 values
test = data_ts.iloc[-test_size:]

# Validation: 24 values before test
validation = data_ts.iloc[-(test_size + val_size):-test_size]

# Training: All other records
train = data_ts.iloc[:-(test_size + val_size)]

# 2. Rolling Forecast with a 4-hour window
alpha = 0.5  # Smoothing parameter for level
beta = 0.3   # Smoothing parameter for trend
window_size = 1  # Rolling window size in hours

# Initialize lists to store rolling forecasts and actuals for Validation and Test
val_rolling_forecasts = []
val_rolling_actuals = []
test_rolling_forecasts = []
test_rolling_actuals = []

# Perform rolling forecasts for Validation set
for i in range(0, len(validation), window_size):
    # Update training data with the most recent window
    current_train = pd.concat([train, validation.iloc[:i]])

    # Forecast for the next window
    current_forecast = forecast_secondsmooth(current_train, steps=window_size, alpha=alpha, beta=beta)

    # Append forecasts and actuals for Validation
    val_rolling_forecasts.extend(current_forecast.values)
    val_rolling_actuals.extend(validation.iloc[i:i + window_size].values)

# Convert lists to pandas Series for Validation
val_rolling_forecasts = pd.Series(val_rolling_forecasts, index=validation.index[:len(val_rolling_forecasts)])
val_rolling_actuals = pd.Series(val_rolling_actuals, index=validation.index[:len(val_rolling_actuals)])

# Perform rolling forecasts for Test set
combined_train_val = pd.concat([train, validation])  # Combine training and validation data
for i in range(0, len(test), window_size):
    # Update training data with the most recent window
    current_train = pd.concat([combined_train_val, test.iloc[:i]])

    # Forecast for the next window
    current_forecast = forecast_secondsmooth(current_train, steps=window_size, alpha=alpha, beta=beta)

    # Append forecasts and actuals for Test
    test_rolling_forecasts.extend(current_forecast.values)
    test_rolling_actuals.extend(test.iloc[i:i + window_size].values)

# Convert lists to pandas Series for Test
test_rolling_forecasts = pd.Series(test_rolling_forecasts, index=test.index[:len(test_rolling_forecasts)])
test_rolling_actuals = pd.Series(test_rolling_actuals, index=test.index[:len(test_rolling_actuals)])

# 3. Calculate metrics for rolling forecasts
def calculate_metrics(actual, forecast):
    """
    Calculate MAPE, MAE, RMSE, and MSE.
    """
    actual = actual.replace(0, 1e-9)  # Avoid division by zero
    metrics = {
        'MAPE': np.mean(np.abs((actual - forecast) / actual)) * 100,
        'MAE': mean_absolute_error(actual, forecast),
        'RMSE': np.sqrt(mean_squared_error(actual, forecast)),
        'MSE': mean_squared_error(actual, forecast),
    }
    return metrics

# Rolling forecast metrics for Validation
val_rolling_metrics = calculate_metrics(val_rolling_actuals, val_rolling_forecasts)
val_rolling_residuals = val_rolling_actuals - val_rolling_forecasts

# Rolling forecast metrics for Test
test_rolling_metrics = calculate_metrics(test_rolling_actuals, test_rolling_forecasts)
test_rolling_residuals = test_rolling_actuals - test_rolling_forecasts

# Calculate Ljung-Box p-value for rolling residuals
val_rolling_ljungbox = acorr_ljungbox(val_rolling_residuals, lags=10, return_df=True)
val_rolling_metrics['Ljung-Box p-value'] = val_rolling_ljungbox['lb_pvalue'].values[0]

test_rolling_ljungbox = acorr_ljungbox(test_rolling_residuals, lags=10, return_df=True)
test_rolling_metrics['Ljung-Box p-value'] = test_rolling_ljungbox['lb_pvalue'].values[0]

# Perform Wald-Wolfowitz Randomness Test
val_runs_stat, val_runs_pvalue = runstest_1samp(val_rolling_residuals)
test_runs_stat, test_runs_pvalue = runstest_1samp(test_rolling_residuals)

# Add randomness test results to metrics
val_rolling_metrics['Randomness p-value'] = val_runs_pvalue
test_rolling_metrics['Randomness p-value'] = test_runs_pvalue

# Perform Kolmogorov-Smirnov Normality Test
val_ks_stat, val_ks_pvalue = stats.kstest(val_rolling_residuals, 'norm')
test_ks_stat, test_ks_pvalue = stats.kstest(test_rolling_residuals, 'norm')

# Add normality test results to metrics
val_rolling_metrics['KS-Normality p-value'] = val_ks_pvalue
test_rolling_metrics['KS-Normality p-value'] = test_ks_pvalue

# Create a DataFrame for metrics
metrics_table = pd.DataFrame([val_rolling_metrics, test_rolling_metrics], index=['Validation', 'Test'])

# Create a separate table for interpretation
interpretation_table = pd.DataFrame({
    'Randomness': ['Random' if val_runs_pvalue > 0.05 else 'Not Random', 'Random' if test_runs_pvalue > 0.05 else 'Not Random'],
    'KS-Normality': ['Normal' if val_ks_pvalue > 0.05 else 'Not Normal', 'Normal' if test_ks_pvalue > 0.05 else 'Not Normal']
}, index=['Validation', 'Test'])

# Print the tables
print("Metrics Table:")
display(metrics_table.round(4))  # Use display() to render the DataFrame

print("\nTest Interpretation Table:")
display(interpretation_table.round(4))  # Use display() to render the DataFrame

# 4. Plot Training, Validation, Test, and Rolling Forecasts
plt.figure(figsize=(14, 6))
plt.plot(train.iloc[-48:].index, train.iloc[-48:], label='Training (Last 48)', color='blue')
plt.plot(validation.index, validation, label='Validation', color='green')
plt.plot(test.index, test, label='Test', color='red')
plt.plot(val_rolling_forecasts.index, val_rolling_forecasts, label='Validation Rolling Forecast', color='orange', linestyle='--')
plt.plot(test_rolling_forecasts.index, test_rolling_forecasts, label='Test Rolling Forecast', color='purple', linestyle='--')
plt.title(f'Rolling Second-Order Exponential Smoothing Forecast (Alpha={alpha}, Beta={beta})\nValidation MAE: {val_rolling_metrics["MAE"]:.2f}, Test MAE: {test_rolling_metrics["MAE"]:.2f}')
plt.xlabel('Data_Hora')
plt.ylabel('Temperature (°C)')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

# 5. Residual Analysis: Validation and Test (2x2 Grid)
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# Quadrant 0,0: Residuals vs Time (Validation)
axes[0, 0].plot(val_rolling_residuals.index, val_rolling_residuals, label='Validation Residuals', color='blue')
axes[0, 0].axhline(0, color='black', linestyle='--')
#axes[0, 0].set_title('Validation Residuals vs Time')
axes[0, 0].set_title(f'Validation Residuals vs Time\nRandomness: {"Random" if val_runs_pvalue > 0.05 else "Not Random"} ({val_runs_pvalue:.4f})')
axes[0, 0].set_xlabel('Data_Hora')
axes[0, 0].set_ylabel('Residuals')
axes[0, 0].legend()
axes[0, 0].grid(True, alpha=0.3)

# Quadrant 0,1: Residuals Density and Histogram (Validation)
val_rolling_residuals.plot(kind='hist', bins=20, density=True, alpha=0.5, label='Validation Residuals Histogram', color='blue', ax=axes[0, 1])
val_rolling_residuals.plot(kind='kde', label='Validation Residuals Density', color='blue', linestyle='--', ax=axes[0, 1])
axes[0, 1].set_title('Validation Residuals Histogram and Density')
axes[0, 1].set_xlabel('Residuals')
axes[0, 1].set_ylabel('Density')
axes[0, 1].legend()
axes[0, 1].grid(True, alpha=0.3)

# Quadrant 1,0: Q-Q Plot (Validation)
stats.probplot(val_rolling_residuals, dist="norm", plot=axes[1, 0])
#axes[1, 0].set_title('Validation Residuals Q-Q Plot')
axes[1, 0].set_title(f'Validation Residuals Q-Q Plot\nKS-Normality: {"Normal" if val_ks_pvalue > 0.05 else "Not Normal"} ({val_ks_pvalue:.4f})')
axes[1, 0].grid(True, alpha=0.3)

# Quadrant 1,1: ACF (Validation)
plot_acf(val_rolling_residuals, lags=20, ax=axes[1, 1], title='ACF of Validation Residuals', color='blue')
axes[1, 1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# 6. Residual Analysis: Test (2x2 Grid)
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# Quadrant 0,0: Residuals vs Time (Test)
axes[0, 0].plot(test_rolling_residuals.index, test_rolling_residuals, label='Test Residuals', color='red')
axes[0, 0].axhline(0, color='black', linestyle='--')
#axes[0, 0].set_title('Test Residuals vs Time')
axes[0, 0].set_title(f'Test Residuals vs Time\nRandomness: {"Random" if test_runs_pvalue > 0.05 else "Not Random"} ({test_runs_pvalue:.4f})')
axes[0, 0].set_xlabel('Data_Hora')
axes[0, 0].set_ylabel('Residuals')
axes[0, 0].legend()
axes[0, 0].grid(True, alpha=0.3)

# Quadrant 0,1: Residuals Density and Histogram (Test)
test_rolling_residuals.plot(kind='hist', bins=20, density=True, alpha=0.5, label='Test Residuals Histogram', color='red', ax=axes[0, 1])
test_rolling_residuals.plot(kind='kde', label='Test Residuals Density', color='red', linestyle='--', ax=axes[0, 1])
axes[0, 1].set_title('Test Residuals Histogram and Density')
axes[0, 1].set_xlabel('Residuals')
axes[0, 1].set_ylabel('Density')
axes[0, 1].legend()
axes[0, 1].grid(True, alpha=0.3)

# Quadrant 1,0: Q-Q Plot (Test)
stats.probplot(test_rolling_residuals, dist="norm", plot=axes[1, 0])
#axes[1, 0].set_title('Test Residuals Q-Q Plot')
axes[1, 0].set_title(f'Test Residuals Q-Q Plot\nKS-Normality: {"Normal" if test_ks_pvalue > 0.05 else "Not Normal"} ({test_ks_pvalue:.4f})')
axes[1, 0].grid(True, alpha=0.3)

# Quadrant 1,1: ACF (Test)
plot_acf(test_rolling_residuals, lags=20, ax=axes[1, 1], title='ACF of Test Residuals', color='red')
axes[1, 1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Add the model: DES
tempSeries = pd.Series(val_rolling_residuals)
ResTableVal.loc[1] = {"ModelName": "DES", "ResValues": tempSeries.values}
tempSeries = pd.Series(test_rolling_residuals)
ResTableTest.loc[1] = {"ModelName": "DES", "ResValues": tempSeries.values}
Metrics Table:
MAPE MAE RMSE MSE Ljung-Box p-value Randomness p-value KS-Normality p-value
Validation 4.3579 0.9152 1.1965 1.4316 0.0018 0.0263 0.8145
Test 5.4639 1.1326 1.3748 1.8901 0.0004 0.0228 0.7360
Test Interpretation Table:
Randomness KS-Normality
Validation Not Random Normal
Test Not Random Normal

8. Modelo de Suavización Exponencial Simple Holt-Winters (SES-HW)¶

Este código implementa un modelo de suavización exponencial simple de Holt-Winters (SES-HW) utilizando la librería statsmodels para realizar pronósticos en la serie de tiempo de temperatura en la estación A610 del sudoeste de Brasil

Divide los datos en conjuntos de entrenamiento, validación y prueba, y realiza pronósticos en ventanas móviles (rolling forecasts).

Evalúa el rendimiento del modelo mediante métricas como MAPE, MAE, RMSE y MSE, y realiza pruebas estadísticas (Ljung-Box, Wald-Wolfowitz y Kolmogorov-Smirnov) para analizar los residuos.

Genera gráficos de los pronósticos, residuos y análisis de autocorrelación, y almacena los residuos en tablas para su posterior uso.

Finalmente, genera gráficos de los pronósticos (incluyendo el resultado de las pruebas de normalidad y aleatoriedad), residuos y análisis de autocorrelación, y almacena los residuos en tablas para su posterior uso.

Los resultados de las métricas y la interpretación de los resultados de normalidad y aleatoriedad se presentan en tablas independientes.

In [59]:
# JDF-2025Mar09 - SES-HW with Rolling Improvement for Validation and Test Sets
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats
from statsmodels.tsa.holtwinters import SimpleExpSmoothing
from statsmodels.stats.diagnostic import acorr_ljungbox
from statsmodels.graphics.tsaplots import plot_acf
from sklearn.metrics import mean_absolute_error, mean_squared_error

nHours = 24
nDaysYear = 365
NoYears = 3
backHorizon = nHours * nDaysYear * NoYears
data_ts = data_ori
data_ts = data_ts[-backHorizon:]

# 1. Split the series into training, validation, and test sets
test_size = 24
val_size = 24

# Test: Last 24 values
test = data_ts.iloc[-test_size:]

# Validation: 24 values before test
validation = data_ts.iloc[-(test_size + val_size):-test_size]

# Training: All other records
train = data_ts.iloc[:-(test_size + val_size)]

# 2. Create a first-order SES model
def ses_forecast(train, steps, alpha=0.5):
    """
    Simple Exponential Smoothing (SES) forecast.

    Parameters:
        train (pd.Series): Training data.
        steps (int): Number of steps to forecast.
        alpha (float): Smoothing parameter (0 < alpha < 1).

    Returns:
        pd.Series: Forecasted values.
    """
    model = SimpleExpSmoothing(train)
    model_fit = model.fit(smoothing_level=alpha, optimized=False)
    forecast = model_fit.forecast(steps=steps)
    return forecast, model_fit

# 3. Rolling Forecast with a 4-hour window
alpha = 0.5  # Smoothing parameter
window_size = 1  # Rolling window size in hours

# Initialize lists to store rolling forecasts and actuals for Validation and Test
val_rolling_forecasts = []
val_rolling_actuals = []
test_rolling_forecasts = []
test_rolling_actuals = []

# Perform rolling forecasts for Validation set
for i in range(0, len(validation), window_size):
    # Update training data with the most recent window
    current_train = pd.concat([train, validation.iloc[:i]])

    # Forecast for the next window
    current_forecast, _ = ses_forecast(current_train, steps=window_size, alpha=alpha)

    # Append forecasts and actuals for Validation
    val_rolling_forecasts.extend(current_forecast.values)
    val_rolling_actuals.extend(validation.iloc[i:i + window_size].values)

# Convert lists to pandas Series for Validation
val_rolling_forecasts = pd.Series(val_rolling_forecasts, index=validation.index[:len(val_rolling_forecasts)])
val_rolling_actuals = pd.Series(val_rolling_actuals, index=validation.index[:len(val_rolling_actuals)])

# Perform rolling forecasts for Test set
combined_train_val = pd.concat([train, validation])  # Combine training and validation data
for i in range(0, len(test), window_size):
    # Update training data with the most recent window
    current_train = pd.concat([combined_train_val, test.iloc[:i]])

    # Forecast for the next window
    current_forecast, _ = ses_forecast(current_train, steps=window_size, alpha=alpha)

    # Append forecasts and actuals for Test
    test_rolling_forecasts.extend(current_forecast.values)
    test_rolling_actuals.extend(test.iloc[i:i + window_size].values)

# Convert lists to pandas Series for Test
test_rolling_forecasts = pd.Series(test_rolling_forecasts, index=test.index[:len(test_rolling_forecasts)])
test_rolling_actuals = pd.Series(test_rolling_actuals, index=test.index[:len(test_rolling_actuals)])

# 4. Calculate metrics for rolling forecasts
def calculate_metrics(actual, forecast):
    """
    Calculate MAPE, MAE, RMSE, MSE, and Ljung-Box p-value.
    """
    actual = actual.replace(0, 1e-9)  # Avoid division by zero
    metrics = {
        'MAPE': np.mean(np.abs((actual - forecast) / actual)) * 100,
        'MAE': mean_absolute_error(actual, forecast),
        'RMSE': np.sqrt(mean_squared_error(actual, forecast)),
        'MSE': mean_squared_error(actual, forecast),
    }
    return metrics

# Rolling forecast metrics for Validation
val_rolling_metrics = calculate_metrics(val_rolling_actuals, val_rolling_forecasts)
val_rolling_residuals = val_rolling_actuals - val_rolling_forecasts
val_rolling_metrics['Ljung-Box p-value'] = acorr_ljungbox(val_rolling_residuals, lags=10, return_df=True)['lb_pvalue'].values[0]

# Rolling forecast metrics for Test
test_rolling_metrics = calculate_metrics(test_rolling_actuals, test_rolling_forecasts)
test_rolling_residuals = test_rolling_actuals - test_rolling_forecasts
test_rolling_metrics['Ljung-Box p-value'] = acorr_ljungbox(test_rolling_residuals, lags=10, return_df=True)['lb_pvalue'].values[0]

# Perform Wald-Wolfowitz Randomness Test
val_runs_stat, val_runs_pvalue = runstest_1samp(val_rolling_residuals)
test_runs_stat, test_runs_pvalue = runstest_1samp(test_rolling_residuals)

# Add randomness test results to metrics
val_rolling_metrics['Randomness p-value'] = val_runs_pvalue
test_rolling_metrics['Randomness p-value'] = test_runs_pvalue

# Perform Kolmogorov-Smirnov Normality Test
val_ks_stat, val_ks_pvalue = stats.kstest(val_rolling_residuals, 'norm')
test_ks_stat, test_ks_pvalue = stats.kstest(test_rolling_residuals, 'norm')

# Add normality test results to metrics
val_rolling_metrics['KS-Normality p-value'] = val_ks_pvalue
test_rolling_metrics['KS-Normality p-value'] = test_ks_pvalue

# Create a DataFrame for metrics
metrics_table = pd.DataFrame([val_rolling_metrics, test_rolling_metrics], index=['Validation', 'Test'])

# Create a separate table for interpretation
interpretation_table = pd.DataFrame({
    'Randomness': ['Random' if val_runs_pvalue > 0.05 else 'Not Random', 'Random' if test_runs_pvalue > 0.05 else 'Not Random'],
    'KS-Normality': ['Normal' if val_ks_pvalue > 0.05 else 'Not Normal', 'Normal' if test_ks_pvalue > 0.05 else 'Not Normal']
}, index=['Validation', 'Test'])

# Print the tables
print("Metrics Table:")
display(metrics_table.round(4))  # Use display() to render the DataFrame

print("\nTest Interpretation Table:")
display(interpretation_table.round(4))  # Use display() to render the DataFrame

# 5. Plot Training, Validation, Test, and Rolling Forecasts
plt.figure(figsize=(14, 6))
plt.plot(train.iloc[-48:].index, train.iloc[-48:], label='Training (Last 48)', color='blue')
plt.plot(validation.index, validation, label='Validation', color='green')
plt.plot(test.index, test, label='Test', color='red')
plt.plot(val_rolling_forecasts.index, val_rolling_forecasts, label='Validation Rolling Forecast', color='orange', linestyle='--')
plt.plot(test_rolling_forecasts.index, test_rolling_forecasts, label='Test Rolling Forecast', color='purple', linestyle='--')
plt.title(f'SES Rolling Forecast (Alpha={alpha})\nValidation MAE: {val_rolling_metrics["MAE"]:.2f}, Test MAE: {test_rolling_metrics["MAE"]:.2f}')
plt.xlabel('Data_Hora')
plt.ylabel('Temperature (°C)')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

# 6. Residual Analysis: Validation and Test (2x2 Grid)
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# Quadrant 0,0: Residuals vs Time (Validation)
axes[0, 0].plot(val_rolling_residuals.index, val_rolling_residuals, label='Validation Residuals', color='blue')
axes[0, 0].axhline(0, color='black', linestyle='--')
#axes[0, 0].set_title('Validation Residuals vs Time')
axes[0, 0].set_title(f'Validation Residuals vs Time\nRandomness: {"Random" if val_runs_pvalue > 0.05 else "Not Random"} ({val_runs_pvalue:.4f})')
axes[0, 0].set_xlabel('Data_Hora')
axes[0, 0].set_ylabel('Residuals')
axes[0, 0].legend()
axes[0, 0].grid(True, alpha=0.3)

# Quadrant 0,1: Residuals Density and Histogram (Validation)
val_rolling_residuals.plot(kind='hist', bins=20, density=True, alpha=0.5, label='Validation Residuals Histogram', color='blue', ax=axes[0, 1])
val_rolling_residuals.plot(kind='kde', label='Validation Residuals Density', color='blue', linestyle='--', ax=axes[0, 1])
axes[0, 1].set_title('Validation Residuals Histogram and Density')
axes[0, 1].set_xlabel('Residuals')
axes[0, 1].set_ylabel('Density')
axes[0, 1].legend()
axes[0, 1].grid(True, alpha=0.3)

# Quadrant 1,0: Q-Q Plot (Validation)
stats.probplot(val_rolling_residuals, dist="norm", plot=axes[1, 0])
#axes[1, 0].set_title('Validation Residuals Q-Q Plot')
axes[1, 0].set_title(f'Validation Residuals Q-Q Plot\nKS-Normality: {"Normal" if val_ks_pvalue > 0.05 else "Not Normal"} ({val_ks_pvalue:.4f})')
axes[1, 0].grid(True, alpha=0.3)

# Quadrant 1,1: ACF (Validation)
plot_acf(val_rolling_residuals, lags=20, ax=axes[1, 1], title='ACF of Validation Residuals', color='blue')
axes[1, 1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# 7. Residual Analysis: Test (2x2 Grid)
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# Quadrant 0,0: Residuals vs Time (Test)
axes[0, 0].plot(test_rolling_residuals.index, test_rolling_residuals, label='Test Residuals', color='red')
axes[0, 0].axhline(0, color='black', linestyle='--')
#axes[0, 0].set_title('Test Residuals vs Time')
axes[0, 0].set_title(f'Test Residuals vs Time\nRandomness: {"Random" if test_runs_pvalue > 0.05 else "Not Random"} ({test_runs_pvalue:.4f})')
axes[0, 0].set_xlabel('Data_Hora')
axes[0, 0].set_ylabel('Residuals')
axes[0, 0].legend()
axes[0, 0].grid(True, alpha=0.3)

# Quadrant 0,1: Residuals Density and Histogram (Test)
test_rolling_residuals.plot(kind='hist', bins=20, density=True, alpha=0.5, label='Test Residuals Histogram', color='red', ax=axes[0, 1])
test_rolling_residuals.plot(kind='kde', label='Test Residuals Density', color='red', linestyle='--', ax=axes[0, 1])
axes[0, 1].set_title('Test Residuals Histogram and Density')
axes[0, 1].set_xlabel('Residuals')
axes[0, 1].set_ylabel('Density')
axes[0, 1].legend()
axes[0, 1].grid(True, alpha=0.3)

# Quadrant 1,0: Q-Q Plot (Test)
stats.probplot(test_rolling_residuals, dist="norm", plot=axes[1, 0])
# axes[1, 0].set_title('Test Residuals Q-Q Plot')
axes[1, 0].set_title(f'Test Residuals Q-Q Plot\nKS-Normality: {"Normal" if test_ks_pvalue > 0.05 else "Not Normal"} ({test_ks_pvalue:.4f})')
axes[1, 0].grid(True, alpha=0.3)

# Quadrant 1,1: ACF (Test)
plot_acf(test_rolling_residuals, lags=20, ax=axes[1, 1], title='ACF of Test Residuals', color='red')
axes[1, 1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Add the model: SES-HW
tempSeries = pd.Series(val_rolling_residuals)
ResTableVal.loc[2] = {"ModelName": "SES-HW", "ResValues": tempSeries.values}
tempSeries = pd.Series(test_rolling_residuals)
ResTableTest.loc[2] = {"ModelName": "SES-HW", "ResValues": tempSeries.values}
Metrics Table:
MAPE MAE RMSE MSE Ljung-Box p-value Randomness p-value KS-Normality p-value
Validation 4.9992 1.0321 1.2778 1.6328 0.0001 0.0026 0.0457
Test 5.1689 1.0843 1.4966 2.2397 0.0000 0.0026 0.5640
Test Interpretation Table:
Randomness KS-Normality
Validation Not Random Not Normal
Test Not Random Normal

A continuación el código Python para generar el modelo SES-HW

9. Modelo de Suavización Exponencial Doble Holt-Winters (DES-HW)¶

A continuación el código Python para generar el modelo DES-HW

Este código implementa un modelo de suavización exponencial doble de Holt-Winters (DES-HW) utilizando la librería statsmodels para realizar pronósticos en la serie de tiempo de la temperatura medida en la estación meteorológica A601 del sudoeste de Brasil.

Divide los datos en conjuntos de entrenamiento, validación y prueba, y realiza pronósticos en ventanas móviles (rolling forecasts).

Evalúa el rendimiento del modelo mediante métricas como MAPE, MAE, RMSE y MSE, y realiza pruebas estadísticas (Ljung-Box, Wald-Wolfowitz y Kolmogorov-Smirnov) para analizar los residuos.

Finalmente, genera gráficos de los pronósticos, residuos y análisis de autocorrelación, y almacena los residuos en tablas para su posterior uso.

Los resultados de las métricas y la interpretación de los resultados de normalidad y aleatoriedad se presentan en tablas independientes.

In [60]:
# JDF-2025Mar09 - DES-HW with Rolling Improvement for Validation and Test Sets
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats
from statsmodels.tsa.holtwinters import ExponentialSmoothing
from statsmodels.stats.diagnostic import acorr_ljungbox
from statsmodels.graphics.tsaplots import plot_acf
from sklearn.metrics import mean_absolute_error, mean_squared_error

nHours = 24
nDaysYear = 365
NoYears = 3
backHorizon = nHours * nDaysYear * NoYears
data_ts = data_ori
data_ts = data_ts[-backHorizon:]

# 1. Split the series into training, validation, and test sets
test_size = 24
val_size = 24

# Test: Last 24 values
test = data_ts.iloc[-test_size:]

# Validation: 24 values before test
validation = data_ts.iloc[-(test_size + val_size):-test_size]

# Training: All other records
train = data_ts.iloc[:-(test_size + val_size)]

# 2. Create a Double Exponential Smoothing (DES) model
def des_forecast(train, steps, alpha=0.5, beta=0.5):
    """
    Double Exponential Smoothing (DES) forecast.

    Parameters:
        train (pd.Series): Training data.
        steps (int): Number of steps to forecast.
        alpha (float): Smoothing parameter for level (0 < alpha < 1).
        beta (float): Smoothing parameter for trend (0 < beta < 1).

    Returns:
        pd.Series: Forecasted values.
    """
    model = ExponentialSmoothing(train, trend='add', seasonal=None)
    model_fit = model.fit(smoothing_level=alpha, smoothing_trend=beta, optimized=False)
    forecast = model_fit.forecast(steps=steps)
    return forecast, model_fit

# 3. Rolling Forecast with a 4-hour window
alpha = 0.5  # Smoothing parameter for level
beta = 0.5   # Smoothing parameter for trend
window_size = 1  # Rolling window size in hours

# Initialize lists to store rolling forecasts and actuals for Validation and Test
val_rolling_forecasts = []
val_rolling_actuals = []
test_rolling_forecasts = []
test_rolling_actuals = []

# Perform rolling forecasts for Validation set
for i in range(0, len(validation), window_size):
    # Update training data with the most recent window
    current_train = pd.concat([train, validation.iloc[:i]])

    # Forecast for the next window
    current_forecast, _ = des_forecast(current_train, steps=window_size, alpha=alpha, beta=beta)

    # Append forecasts and actuals for Validation
    val_rolling_forecasts.extend(current_forecast.values)
    val_rolling_actuals.extend(validation.iloc[i:i + window_size].values)

# Convert lists to pandas Series for Validation
val_rolling_forecasts = pd.Series(val_rolling_forecasts, index=validation.index[:len(val_rolling_forecasts)])
val_rolling_actuals = pd.Series(val_rolling_actuals, index=validation.index[:len(val_rolling_actuals)])

# Perform rolling forecasts for Test set
combined_train_val = pd.concat([train, validation])  # Combine training and validation data
for i in range(0, len(test), window_size):
    # Update training data with the most recent window
    current_train = pd.concat([combined_train_val, test.iloc[:i]])

    # Forecast for the next window
    current_forecast, _ = des_forecast(current_train, steps=window_size, alpha=alpha, beta=beta)

    # Append forecasts and actuals for Test
    test_rolling_forecasts.extend(current_forecast.values)
    test_rolling_actuals.extend(test.iloc[i:i + window_size].values)

# Convert lists to pandas Series for Test
test_rolling_forecasts = pd.Series(test_rolling_forecasts, index=test.index[:len(test_rolling_forecasts)])
test_rolling_actuals = pd.Series(test_rolling_actuals, index=test.index[:len(test_rolling_actuals)])

# 4. Calculate metrics for rolling forecasts
def calculate_metrics(actual, forecast):
    """
    Calculate MAPE, MAE, RMSE, MSE, and Ljung-Box p-value.
    """
    actual = actual.replace(0, 1e-9)  # Avoid division by zero
    metrics = {
        'MAPE': np.mean(np.abs((actual - forecast) / actual)) * 100,
        'MAE': mean_absolute_error(actual, forecast),
        'RMSE': np.sqrt(mean_squared_error(actual, forecast)),
        'MSE': mean_squared_error(actual, forecast),
    }
    return metrics

# Rolling forecast metrics for Validation
val_rolling_metrics = calculate_metrics(val_rolling_actuals, val_rolling_forecasts)
val_rolling_residuals = val_rolling_actuals - val_rolling_forecasts
val_rolling_metrics['Ljung-Box p-value'] = acorr_ljungbox(val_rolling_residuals, lags=10, return_df=True)['lb_pvalue'].values[0]

# Rolling forecast metrics for Test
test_rolling_metrics = calculate_metrics(test_rolling_actuals, test_rolling_forecasts)
test_rolling_residuals = test_rolling_actuals - test_rolling_forecasts
test_rolling_metrics['Ljung-Box p-value'] = acorr_ljungbox(test_rolling_residuals, lags=10, return_df=True)['lb_pvalue'].values[0]

# Perform Wald-Wolfowitz Randomness Test
val_runs_stat, val_runs_pvalue = runstest_1samp(val_rolling_residuals)
test_runs_stat, test_runs_pvalue = runstest_1samp(test_rolling_residuals)

# Add randomness test results to metrics
val_rolling_metrics['Randomness p-value'] = val_runs_pvalue
test_rolling_metrics['Randomness p-value'] = test_runs_pvalue

# Perform Kolmogorov-Smirnov Normality Test
val_ks_stat, val_ks_pvalue = stats.kstest(val_rolling_residuals, 'norm')
test_ks_stat, test_ks_pvalue = stats.kstest(test_rolling_residuals, 'norm')

# Add normality test results to metrics
val_rolling_metrics['KS-Normality p-value'] = val_ks_pvalue
test_rolling_metrics['KS-Normality p-value'] = test_ks_pvalue

# Create a DataFrame for metrics
metrics_table = pd.DataFrame([val_rolling_metrics, test_rolling_metrics], index=['Validation', 'Test'])

# Create a separate table for interpretation
interpretation_table = pd.DataFrame({
    'Randomness': ['Random' if val_runs_pvalue > 0.05 else 'Not Random', 'Random' if test_runs_pvalue > 0.05 else 'Not Random'],
    'KS-Normality': ['Normal' if val_ks_pvalue > 0.05 else 'Not Normal', 'Normal' if test_ks_pvalue > 0.05 else 'Not Normal']
}, index=['Validation', 'Test'])

# Print the tables
print("Metrics Table:")
display(metrics_table.round(4))  # Use display() to render the DataFrame

print("\nTest Interpretation Table:")
display(interpretation_table.round(4))  # Use display() to render the DataFrame

# 5. Plot Training, Validation, Test, and Rolling Forecasts
plt.figure(figsize=(14, 6))
plt.plot(train.iloc[-48:].index, train.iloc[-48:], label='Training (Last 48)', color='blue')
plt.plot(validation.index, validation, label='Validation', color='green')
plt.plot(test.index, test, label='Test', color='red')
plt.plot(val_rolling_forecasts.index, val_rolling_forecasts, label='Validation Rolling Forecast', color='orange', linestyle='--')
plt.plot(test_rolling_forecasts.index, test_rolling_forecasts, label='Test Rolling Forecast', color='purple', linestyle='--')
plt.title(f'DES Rolling Forecast (Alpha={alpha}, Beta={beta})\nValidation MAE: {val_rolling_metrics["MAE"]:.2f}, Test MAE: {test_rolling_metrics["MAE"]:.2f}')
plt.xlabel('Data_Hora')
plt.ylabel('Temperature (°C)')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

# 6. Residual Analysis: Validation and Test (2x2 Grid)
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# Quadrant 0,0: Residuals vs Time (Validation)
axes[0, 0].plot(val_rolling_residuals.index, val_rolling_residuals, label='Validation Residuals', color='blue')
axes[0, 0].axhline(0, color='black', linestyle='--')
#axes[0, 0].set_title('Validation Residuals vs Time')
axes[0, 0].set_title(f'Validation Residuals vs Time\nRandomness: {"Random" if val_runs_pvalue > 0.05 else "Not Random"} ({val_runs_pvalue:.4f})')
axes[0, 0].set_xlabel('Data_Hora')
axes[0, 0].set_ylabel('Residuals')
axes[0, 0].legend()
axes[0, 0].grid(True, alpha=0.3)

# Quadrant 0,1: Residuals Density and Histogram (Validation)
val_rolling_residuals.plot(kind='hist', bins=20, density=True, alpha=0.5, label='Validation Residuals Histogram', color='blue', ax=axes[0, 1])
val_rolling_residuals.plot(kind='kde', label='Validation Residuals Density', color='blue', linestyle='--', ax=axes[0, 1])
axes[0, 1].set_title('Validation Residuals Histogram and Density')
axes[0, 1].set_xlabel('Residuals')
axes[0, 1].set_ylabel('Density')
axes[0, 1].legend()
axes[0, 1].grid(True, alpha=0.3)

# Quadrant 1,0: Q-Q Plot (Validation)
stats.probplot(val_rolling_residuals, dist="norm", plot=axes[1, 0])
#axes[1, 0].set_title('Validation Residuals Q-Q Plot')
axes[1, 0].set_title(f'Validation Residuals Q-Q Plot\nKS-Normality: {"Normal" if val_ks_pvalue > 0.05 else "Not Normal"} ({val_ks_pvalue:.4f})')
axes[1, 0].grid(True, alpha=0.3)

# Quadrant 1,1: ACF (Validation)
plot_acf(val_rolling_residuals, lags=20, ax=axes[1, 1], title='ACF of Validation Residuals', color='blue')
axes[1, 1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# 7. Residual Analysis: Test (2x2 Grid)
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# Quadrant 0,0: Residuals vs Time (Test)
axes[0, 0].plot(test_rolling_residuals.index, test_rolling_residuals, label='Test Residuals', color='red')
axes[0, 0].axhline(0, color='black', linestyle='--')
#axes[0, 0].set_title('Test Residuals vs Time')
axes[0, 0].set_title(f'Test Residuals vs Time\nRandomness: {"Random" if test_runs_pvalue > 0.05 else "Not Random"} ({test_runs_pvalue:.4f})')
axes[0, 0].set_xlabel('Data_Hora')
axes[0, 0].set_ylabel('Residuals')
axes[0, 0].legend()
axes[0, 0].grid(True, alpha=0.3)

# Quadrant 0,1: Residuals Density and Histogram (Test)
test_rolling_residuals.plot(kind='hist', bins=20, density=True, alpha=0.5, label='Test Residuals Histogram', color='red', ax=axes[0, 1])
test_rolling_residuals.plot(kind='kde', label='Test Residuals Density', color='red', linestyle='--', ax=axes[0, 1])
axes[0, 1].set_title('Test Residuals Histogram and Density')
axes[0, 1].set_xlabel('Residuals')
axes[0, 1].set_ylabel('Density')
axes[0, 1].legend()
axes[0, 1].grid(True, alpha=0.3)

# Quadrant 1,0: Q-Q Plot (Test)
stats.probplot(test_rolling_residuals, dist="norm", plot=axes[1, 0])
#axes[1, 0].set_title('Test Residuals Q-Q Plot')
axes[1, 0].set_title(f'Test Residuals Q-Q Plot\nKS-Normality: {"Normal" if test_ks_pvalue > 0.05 else "Not Normal"} ({test_ks_pvalue:.4f})')
axes[1, 0].grid(True, alpha=0.3)

# Quadrant 1,1: ACF (Test)
plot_acf(test_rolling_residuals, lags=20, ax=axes[1, 1], title='ACF of Test Residuals', color='red')
axes[1, 1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Add the model: DES-HW
tempSeries = pd.Series(val_rolling_residuals)
ResTableVal.loc[3] = {"ModelName": "DES-HW", "ResValues": tempSeries.values}
tempSeries = pd.Series(test_rolling_residuals)
ResTableTest.loc[3] = {"ModelName": "DES-HW", "ResValues": tempSeries.values}
Metrics Table:
MAPE MAE RMSE MSE Ljung-Box p-value Randomness p-value KS-Normality p-value
Validation 4.5535 0.9613 1.2635 1.5964 0.0006 0.007 0.8043
Test 5.5504 1.1652 1.4165 2.0066 0.0001 0.008 0.3423
Test Interpretation Table:
Randomness KS-Normality
Validation Not Random Normal
Test Not Random Normal

10. Suavizado exponencial - Holt Winters¶

Método de promedio móvil: se elige un ancho de ventana específico y se toma el promedio de los puntos de datos dentro de esta ventana para formar un nuevo punto de datos. Luego, esta ventana se desliza a lo largo del conjunto de datos para calcular los promedios móviles para todo el conjunto.

Método de suavizado exponencial: se formulan nuevas predicciones tomando el promedio ponderado de los puntos de datos pasados. En este método, los datos más recientes reciben un peso mayor, mientras que los datos más antiguos reciben un peso menor. Esto se basa en el supuesto de que los datos están más relacionados con su pasado reciente.

Otras divisiones en el suavizado exponencial: el método de suavizado exponencial se divide a su vez en suavizado exponencial simple, doble y triple. Estos métodos se utilizan para modelar diferentes componentes (nivel, tendencia y estacionalidad) en el conjunto de datos.

Se realiza una separación de los datos en un conjunto de entrenamiento y test, este último, eventualmente será comparado con la predicción.

Este código define una función ts_decompose que realiza la descomposición de una serie de tiempo en sus componentes principales: tendencia, estacionalidad y residuos.

Utiliza la función seasonal_decompose de statsmodels para descomponer la serie en un modelo aditivo o multiplicativo.

Luego, genera un gráfico de 4 subplots que muestran la serie original, la tendencia, la estacionalidad y los residuos, junto con sus medias.

La función también permite especificar el período de estacionalidad y suprime advertencias para una salida más limpia.

In [61]:
### Se cargan las liberías

import itertools
import warnings
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import statsmodels.api as sm
from sklearn.metrics import mean_squared_error, mean_absolute_error, mean_absolute_percentage_error
from statsmodels.tsa.holtwinters import ExponentialSmoothing
from statsmodels.tsa.holtwinters import SimpleExpSmoothing
from statsmodels.tsa.seasonal import seasonal_decompose
import statsmodels.tsa.api as smt


warnings.filterwarnings('ignore')

def ts_decompose(y, model="additive", stationary=False, period=24): # Added period argument
    result = seasonal_decompose(y, model=model, period=period) # Pass period to seasonal_decompose
    fig, axes = plt.subplots(4, 1, sharex=True, sharey=False)
    fig.set_figheight(10)
    fig.set_figwidth(15)

    axes[0].set_title("Decomposition for " + model + " model")
    axes[0].plot(y, label='Original ' + model)
    axes[0].legend(loc='upper left')

    axes[1].plot(result.trend, label='Trend')
    axes[1].legend(loc='upper left')

    axes[2].plot(result.seasonal, label='Seasonality & Mean: ' + str(round(result.seasonal.mean(), 4)))
    axes[2].legend(loc='upper left')

    axes[3].plot(result.resid, label='Residuals & Mean: ' + str(round(result.resid.mean(), 4)))
    axes[3].legend(loc='upper left')
    plt.show(block=True)
In [32]:
ts_decompose(data_ts, stationary=True)

Con base en el resultado de la función, se modelaron con éxito los componentes 'Nivel', 'Tendencia' y 'Estacionalidad' de la serie. La distribución de los residuos se ilustra en un gráfico. Al observar estos residuos, podemos ver que se distribuyen alrededor de cero y no siguen una tendencia definida. Por lo tanto, podemos inferir que la serie tiene una característica 'aditiva'.

11. Modelo de Suavización Exponencial Triple de Holt-Winters (TES-HW)¶

Este código implementa un modelo de suavización exponencial de tercer orden de Holt-Winters (DES-HW) utilizando la librería statsmodels para realizar pronósticos en la serie de tiempo de la temperatura medida en la estación meteorológica A601 del sudoeste de Brasil.

Divide los datos en conjuntos de entrenamiento, validación y prueba, y realiza pronósticos en ventanas móviles (rolling forecasts).

Evalúa el rendimiento del modelo mediante métricas como MAPE, MAE, RMSE y MSE, y realiza pruebas estadísticas (Ljung-Box, Wald-Wolfowitz y Kolmogorov-Smirnov) para analizar los residuos.

Finalmente, genera gráficos de los pronósticos, residuos y análisis de autocorrelación, y almacena los residuos en tablas para su posterior uso.

Los resultados de las métricas y la interpretación de los resultados de normalidad y aleatoriedad se presentan en tablas independientes.

In [62]:
# JDF-2025Mar09 - TES-HW with Rolling Improvement for Validation and Test Sets
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats
from statsmodels.tsa.holtwinters import ExponentialSmoothing
from statsmodels.stats.diagnostic import acorr_ljungbox
from statsmodels.graphics.tsaplots import plot_acf
from sklearn.metrics import mean_absolute_error, mean_squared_error

nHours = 24
nDaysYear = 365
NoYears = 3
backHorizon = nHours * nDaysYear * NoYears
data_ts = data_ori
data_ts = data_ts[-backHorizon:]

# 1. Split the series into training, validation, and test sets
test_size = 24
val_size = 24

# Test: Last 24 values
test = data_ts.iloc[-test_size:]

# Validation: 24 values before test
validation = data_ts.iloc[-(test_size + val_size):-test_size]

# Training: All other records
train = data_ts.iloc[:-(test_size + val_size)]

# 2. Create a Third-Order Holt-Winters Exponential Smoothing model
def hw_forecast(train, steps, seasonal_periods=24, alpha=0.5, beta=0.5, gamma=0.5, seasonal='add'):
    """
    Third-Order Holt-Winters Exponential Smoothing forecast.

    Parameters:
        train (pd.Series): Training data.
        steps (int): Number of steps to forecast.
        seasonal_periods (int): Number of periods in a season.
        alpha (float): Smoothing parameter for level (0 < alpha < 1).
        beta (float): Smoothing parameter for trend (0 < beta < 1).
        gamma (float): Smoothing parameter for seasonality (0 < gamma < 1).
        seasonal (str): Type of seasonality ('add' or 'mul').

    Returns:
        pd.Series: Forecasted values.
    """
    model = ExponentialSmoothing(
        train,
        trend='add',
        seasonal=seasonal,
        seasonal_periods=seasonal_periods
    )
    model_fit = model.fit(
        smoothing_level=alpha,
        smoothing_trend=beta,
        smoothing_seasonal=gamma,
        optimized=False
    )
    forecast = model_fit.forecast(steps=steps)
    return forecast, model_fit

# 3. Rolling Forecast with a 4-hour window
alpha = 0.3  # Smoothing parameter for level
beta = 0.1   # Smoothing parameter for trend
gamma = 0.3  # Smoothing parameter for seasonality
seasonal_periods = 24  # Assuming hourly data with daily seasonality (24 hours)
window_size = 1  # Rolling window size in hours

# Initialize lists to store rolling forecasts and actuals for Validation and Test
val_rolling_forecasts = []
val_rolling_actuals = []
test_rolling_forecasts = []
test_rolling_actuals = []

# Perform rolling forecasts for Validation set
for i in range(0, len(validation), window_size):
    # Update training data with the most recent window
    current_train = pd.concat([train, validation.iloc[:i]])

    # Forecast for the next window
    current_forecast, _ = hw_forecast(current_train, steps=window_size, alpha=alpha, beta=beta, gamma=gamma, seasonal='add')

    # Append forecasts and actuals for Validation
    val_rolling_forecasts.extend(current_forecast.values)
    val_rolling_actuals.extend(validation.iloc[i:i + window_size].values)

# Convert lists to pandas Series for Validation
val_rolling_forecasts = pd.Series(val_rolling_forecasts, index=validation.index[:len(val_rolling_forecasts)])
val_rolling_actuals = pd.Series(val_rolling_actuals, index=validation.index[:len(val_rolling_actuals)])

# Perform rolling forecasts for Test set
combined_train_val = pd.concat([train, validation])  # Combine training and validation data
for i in range(0, len(test), window_size):
    # Update training data with the most recent window
    current_train = pd.concat([combined_train_val, test.iloc[:i]])

    # Forecast for the next window
    current_forecast, _ = hw_forecast(current_train, steps=window_size, alpha=alpha, beta=beta, gamma=gamma, seasonal='add')

    # Append forecasts and actuals for Test
    test_rolling_forecasts.extend(current_forecast.values)
    test_rolling_actuals.extend(test.iloc[i:i + window_size].values)

# Convert lists to pandas Series for Test
test_rolling_forecasts = pd.Series(test_rolling_forecasts, index=test.index[:len(test_rolling_forecasts)])
test_rolling_actuals = pd.Series(test_rolling_actuals, index=test.index[:len(test_rolling_actuals)])

# 4. Calculate metrics for rolling forecasts
def calculate_metrics(actual, forecast):
    """
    Calculate MAPE, MAE, RMSE, MSE, and Ljung-Box p-value.
    """
    # Align indices and drop rows with NaN values
    aligned_actual, aligned_forecast = actual.align(forecast, join='inner')
    aligned_actual = aligned_actual.dropna()
    aligned_forecast = aligned_forecast.dropna()

    # Avoid division by zero
    aligned_actual = aligned_actual.replace(0, 1e-9)

    # Calculate metrics
    metrics = {
        'MAPE': np.mean(np.abs((aligned_actual - aligned_forecast) / aligned_actual)) * 100,
        'MAE': mean_absolute_error(aligned_actual, aligned_forecast),
        'RMSE': np.sqrt(mean_squared_error(aligned_actual, aligned_forecast)),
        'MSE': mean_squared_error(aligned_actual, aligned_forecast),
    }
    return metrics

# Rolling forecast metrics for Validation
val_rolling_metrics = calculate_metrics(val_rolling_actuals, val_rolling_forecasts)
val_rolling_residuals = val_rolling_actuals - val_rolling_forecasts

# Drop NaN values from residuals before Ljung-Box test
val_residuals_cleaned = val_rolling_residuals.dropna()
if len(val_residuals_cleaned) > 0:
    val_rolling_metrics['Ljung-Box p-value'] = acorr_ljungbox(val_residuals_cleaned, lags=10, return_df=True)['lb_pvalue'].values[0]
else:
    val_rolling_metrics['Ljung-Box p-value'] = np.nan  # Handle case where all residuals are NaN

# Rolling forecast metrics for Test
test_rolling_metrics = calculate_metrics(test_rolling_actuals, test_rolling_forecasts)
test_rolling_residuals = test_rolling_actuals - test_rolling_forecasts

# Drop NaN values from residuals before Ljung-Box test
test_residuals_cleaned = test_rolling_residuals.dropna()
if len(test_residuals_cleaned) > 0:
    test_rolling_metrics['Ljung-Box p-value'] = acorr_ljungbox(test_residuals_cleaned, lags=10, return_df=True)['lb_pvalue'].values[0]
else:
    test_rolling_metrics['Ljung-Box p-value'] = np.nan  # Handle case where all residuals are NaN

# Perform Wald-Wolfowitz Randomness Test
val_runs_stat, val_runs_pvalue = runstest_1samp(val_rolling_residuals)
test_runs_stat, test_runs_pvalue = runstest_1samp(test_rolling_residuals)

# Add randomness test results to metrics
val_rolling_metrics['Randomness p-value'] = val_runs_pvalue
test_rolling_metrics['Randomness p-value'] = test_runs_pvalue

# Perform Kolmogorov-Smirnov Normality Test
val_ks_stat, val_ks_pvalue = stats.kstest(val_rolling_residuals, 'norm')
test_ks_stat, test_ks_pvalue = stats.kstest(test_rolling_residuals, 'norm')

# Add normality test results to metrics
val_rolling_metrics['KS-Normality p-value'] = val_ks_pvalue
test_rolling_metrics['KS-Normality p-value'] = test_ks_pvalue

# Create a DataFrame for metrics
metrics_table = pd.DataFrame([val_rolling_metrics, test_rolling_metrics], index=['Validation', 'Test'])

# Create a separate table for interpretation
interpretation_table = pd.DataFrame({
    'Randomness': ['Random' if val_runs_pvalue > 0.05 else 'Not Random', 'Random' if test_runs_pvalue > 0.05 else 'Not Random'],
    'KS-Normality': ['Normal' if val_ks_pvalue > 0.05 else 'Not Normal', 'Normal' if test_ks_pvalue > 0.05 else 'Not Normal']
}, index=['Validation', 'Test'])

# Print the tables
print("Metrics Table:")
display(metrics_table.round(4))  # Use display() to render the DataFrame

print("\nTest Interpretation Table:")
display(interpretation_table.round(4))  # Use display() to render the DataFrame

# 5. Plot Training, Validation, Test, and Rolling Forecasts
plt.figure(figsize=(14, 6))
plt.plot(train.iloc[-48:].index, train.iloc[-48:], label='Training (Last 48)', color='blue')
plt.plot(validation.index, validation, label='Validation', color='green')
plt.plot(test.index, test, label='Test', color='red')
plt.plot(val_rolling_forecasts.index, val_rolling_forecasts, label='Validation Rolling Forecast', color='orange', linestyle='--')
plt.plot(test_rolling_forecasts.index, test_rolling_forecasts, label='Test Rolling Forecast', color='purple', linestyle='--')
plt.title(f'Holt-Winters Rolling Forecast (Alpha={alpha}, Beta={beta}, Gamma={gamma})\nValidation MAE: {val_rolling_metrics["MAE"]:.2f}, Test MAE: {test_rolling_metrics["MAE"]:.2f}')
plt.xlabel('Data_Hora')
plt.ylabel('Temperature (°C)')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

# 6. Residual Analysis: Validation and Test (2x2 Grid)
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# Quadrant 0,0: Residuals vs Time (Validation)
axes[0, 0].plot(val_rolling_residuals.index, val_rolling_residuals, label='Validation Residuals', color='blue')
axes[0, 0].axhline(0, color='black', linestyle='--')
#axes[0, 0].set_title('Validation Residuals vs Time')
axes[0, 0].set_title(f'Validation Residuals vs Time\nRandomness: {"Random" if val_runs_pvalue > 0.05 else "Not Random"} ({val_runs_pvalue:.4f})')
axes[0, 0].set_xlabel('Data_Hora')
axes[0, 0].set_ylabel('Residuals')
axes[0, 0].legend()
axes[0, 0].grid(True, alpha=0.3)

# Quadrant 0,1: Residuals Density and Histogram (Validation)
val_rolling_residuals.plot(kind='hist', bins=20, density=True, alpha=0.5, label='Validation Residuals Histogram', color='blue', ax=axes[0, 1])
val_rolling_residuals.plot(kind='kde', label='Validation Residuals Density', color='blue', linestyle='--', ax=axes[0, 1])
axes[0, 1].set_title('Validation Residuals Histogram and Density')
axes[0, 1].set_xlabel('Residuals')
axes[0, 1].set_ylabel('Density')
axes[0, 1].legend()
axes[0, 1].grid(True, alpha=0.3)

# Quadrant 1,0: Q-Q Plot (Validation)
stats.probplot(val_rolling_residuals, dist="norm", plot=axes[1, 0])
#axes[1, 0].set_title('Validation Residuals Q-Q Plot')
axes[1, 0].set_title(f'Validation Residuals Q-Q Plot\nKS-Normality: {"Normal" if val_ks_pvalue > 0.05 else "Not Normal"} ({val_ks_pvalue:.4f})')
axes[1, 0].grid(True, alpha=0.3)

# Quadrant 1,1: ACF (Validation)
plot_acf(val_rolling_residuals, lags=20, ax=axes[1, 1], title='ACF of Validation Residuals', color='blue')
axes[1, 1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# 7. Residual Analysis: Test (2x2 Grid)
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# Quadrant 0,0: Residuals vs Time (Test)
axes[0, 0].plot(test_rolling_residuals.index, test_rolling_residuals, label='Test Residuals', color='red')
axes[0, 0].axhline(0, color='black', linestyle='--')
# axes[0, 0].set_title('Test Residuals vs Time')
axes[0, 0].set_title(f'Test Residuals vs Time\nRandomness: {"Random" if test_runs_pvalue > 0.05 else "Not Random"} ({test_runs_pvalue:.4f})')
axes[0, 0].set_xlabel('Data_Hora')
axes[0, 0].set_ylabel('Residuals')
axes[0, 0].legend()
axes[0, 0].grid(True, alpha=0.3)

# Quadrant 0,1: Residuals Density and Histogram (Test)
test_rolling_residuals.plot(kind='hist', bins=20, density=True, alpha=0.5, label='Test Residuals Histogram', color='red', ax=axes[0, 1])
test_rolling_residuals.plot(kind='kde', label='Test Residuals Density', color='red', linestyle='--', ax=axes[0, 1])
axes[0, 1].set_title('Test Residuals Histogram and Density')
axes[0, 1].set_xlabel('Residuals')
axes[0, 1].set_ylabel('Density')
axes[0, 1].legend()
axes[0, 1].grid(True, alpha=0.3)

# Quadrant 1,0: Q-Q Plot (Test)
stats.probplot(test_rolling_residuals, dist="norm", plot=axes[1, 0])
#axes[1, 0].set_title('Test Residuals Q-Q Plot')
axes[1, 0].set_title(f'Test Residuals Q-Q Plot\nKS-Normality: {"Normal" if test_ks_pvalue > 0.05 else "Not Normal"} ({test_ks_pvalue:.4f})')
axes[1, 0].grid(True, alpha=0.3)

# Quadrant 1,1: ACF (Test)
plot_acf(test_rolling_residuals, lags=20, ax=axes[1, 1], title='ACF of Test Residuals', color='red')
axes[1, 1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Add the model: TES-HW
tempSeries = pd.Series(val_rolling_residuals)
ResTableVal.loc[4] = {"ModelName": "TES-HW", "ResValues": tempSeries.values}
tempSeries = pd.Series(test_rolling_residuals)
ResTableTest.loc[4] = {"ModelName": "TES-HW", "ResValues": tempSeries.values}
Metrics Table:
MAPE MAE RMSE MSE Ljung-Box p-value Randomness p-value KS-Normality p-value
Validation 5.1354 1.0544 1.4955 2.2364 0.0001 0.008 0.6577
Test 3.7055 0.7754 0.9829 0.9661 0.0002 0.094 0.2734
Test Interpretation Table:
Randomness KS-Normality
Validation Not Random Normal
Test Random Normal

12. Redes neuronales recurrentes (RNN)¶

A continuación el código Python para generar el modelo RNN

Este código implementa un modelo de Red Neuronal Recurrente (RNN) para pronósticos en la serie de tiempos de temparatura en la estación A601 del sudoeste de Brasil. Se basa en la librería statsmodels.

Divide los datos en conjuntos de entrenamiento, validación y prueba, y utiliza una ventana de tiempo (look_back) para predecir el siguiente valor.

El modelo se entrena con datos normalizados y se evalúa mediante métricas como MAPE, MAE, RMSE y MSE. Además, se realizan pruebas estadísticas (Ljung-Box, Wald-Wolfowitz y Kolmogorov-Smirnov) para analizar los residuos.

Se generan gráficos de los pronósticos, residuos y análisis de autocorrelación, y se almacenan los residuos en tablas para su posterior uso.

Los resultados de las métricas y la interpretación de los resultados de normalidad y aleatoriedad se presentan en tablas independientes.

In [63]:
# JDF-2025Mar09 - RNN

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats
from statsmodels.stats.diagnostic import acorr_ljungbox
from statsmodels.graphics.tsaplots import plot_acf
from sklearn.metrics import mean_absolute_error, mean_squared_error
from sklearn.preprocessing import MinMaxScaler
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import SimpleRNN, Dense

# 0. Set data_ts with the time series original values
data_ts = data_ori

# 1. Split the series into training, validation, and test sets
test_size = 24
val_size = 24

# Test: Last 24 values
test = data_ts.iloc[-test_size-6:]

# Validation: 24 values before test
validation = data_ts.iloc[-(test_size + val_size)-6:-test_size]

# Training: All other records
train = data_ts.iloc[:-(test_size + val_size)]

# 2. Prepare data for RNN
def create_dataset(data, look_back=6):
    X, y = [], []
    if len(data) > look_back:
        for i in range(len(data) - look_back):
            X.append(data.iloc[i:i + look_back].values)
            y.append(data.iloc[i + look_back])
    return np.array(X), np.array(y)

# Normalize the data
scaler = MinMaxScaler(feature_range=(0, 1))
train_scaled = scaler.fit_transform(train.values.reshape(-1, 1))
validation_scaled = scaler.transform(validation.values.reshape(-1, 1))
test_scaled = scaler.transform(test.values.reshape(-1, 1))

# Create datasets
look_back = 6  # Use the last 24 hours to predict the next hour
X_train, y_train = create_dataset(pd.Series(train_scaled.flatten()), look_back)
X_val, y_val = create_dataset(pd.Series(validation_scaled.flatten()), look_back)
X_test, y_test = create_dataset(pd.Series(test_scaled.flatten()), look_back)

# Reshape input to be [samples, time steps, features]
print(X_val.shape)
print(X_test.shape)
X_train = X_train.reshape(X_train.shape[0], X_train.shape[1], 1)
X_val = X_val.reshape(X_val.shape[0], X_val.shape[1], 1)
X_test = X_test.reshape(X_test.shape[0], X_test.shape[1], 1)

# 3. Build the RNN model
def build_rnn_model(input_shape):
    """
    Build an RNN model.

    Parameters:
        input_shape (tuple): Shape of the input data (time steps, features).

    Returns:
        model: Compiled RNN model.
    """
    model = Sequential()
    model.add(SimpleRNN(50, input_shape=input_shape, activation='relu'))
    model.add(Dense(1))
    model.compile(optimizer='adam', loss='mse')
    return model

# Build and train the model
model = build_rnn_model((X_train.shape[1], X_train.shape[2]))
model.fit(X_train, y_train, epochs=10, batch_size=32, validation_data=(X_val, y_val), verbose=1)

# 4. Forecast for validation and test
def forecast_rnn(model, data, look_back):
    """
    Forecast future values using the RNN model.

    Parameters:
        model: Trained RNN model.
        data (np.array): Input data.
        look_back (int): Number of previous time steps to use as input.

    Returns:
        forecast: Forecasted values.
    """
    forecast = []
    for i in range(len(data) - look_back):
        X = data[i:i + look_back].reshape(1, look_back, 1)
        y_pred = model.predict(X, verbose=0)
        forecast.append(y_pred[0, 0])
    return np.array(forecast)

# Forecast for validation
val_forecast_scaled = forecast_rnn(model, validation_scaled, look_back)
val_forecast = scaler.inverse_transform(val_forecast_scaled.reshape(-1, 1)).flatten()
val_forecast = pd.Series(val_forecast, index=validation.index[look_back:])

# Forecast for test
test_forecast_scaled = forecast_rnn(model, test_scaled, look_back)
test_forecast = scaler.inverse_transform(test_forecast_scaled.reshape(-1, 1)).flatten()
test_forecast = pd.Series(test_forecast, index=test.index[look_back:])

# 5. Calculate metrics
def calculate_metrics(actual, forecast):
    """
    Calculate MAPE, MAE, RMSE, MSE, and Ljung-Box p-value.
    """
    actual = actual.replace(0, 1e-9)  # Avoid division by zero
    metrics = {
        'MAPE': np.mean(np.abs((actual - forecast) / actual)) * 100,
        'MAE': mean_absolute_error(actual, forecast),
        'RMSE': np.sqrt(mean_squared_error(actual, forecast)),
        'MSE': mean_squared_error(actual, forecast),
    }
    return metrics

# Validation metrics
val_metrics = calculate_metrics(validation.iloc[look_back:], val_forecast)
val_residuals = validation.iloc[look_back:] - val_forecast

# Drop NaN values from residuals before Ljung-Box test
val_residuals_cleaned = val_residuals.dropna()
if len(val_residuals_cleaned) > 0:
    val_metrics['Ljung-Box p-value'] = acorr_ljungbox(val_residuals_cleaned, lags=10, return_df=True)['lb_pvalue'].values[0]
else:
    val_metrics['Ljung-Box p-value'] = np.nan  # Handle case where all residuals are NaN

# Test metrics
test_metrics = calculate_metrics(test.iloc[look_back:], test_forecast)
test_residuals = test.iloc[look_back:] - test_forecast

# Drop NaN values from residuals before Ljung-Box test
test_residuals_cleaned = test_residuals.dropna()
if len(test_residuals_cleaned) > 0:
    test_metrics['Ljung-Box p-value'] = acorr_ljungbox(test_residuals_cleaned, lags=10, return_df=True)['lb_pvalue'].values[0]
else:
    test_metrics['Ljung-Box p-value'] = np.nan  # Handle case where all residuals are NaN

# Perform Wald-Wolfowitz Randomness Test
val_runs_stat, val_runs_pvalue = runstest_1samp(val_residuals)
test_runs_stat, test_runs_pvalue = runstest_1samp(test_residuals)

# Add randomness test results to metrics
val_metrics['Randomness p-value'] = val_runs_pvalue
test_metrics['Randomness p-value'] = test_runs_pvalue

# Perform Kolmogorov-Smirnov Normality Test
val_ks_stat, val_ks_pvalue = stats.kstest(val_residuals, 'norm')
test_ks_stat, test_ks_pvalue = stats.kstest(test_residuals, 'norm')

# Add normality test results to metrics
val_metrics['KS-Normality p-value'] = val_ks_pvalue
test_metrics['KS-Normality p-value'] = test_ks_pvalue

# Create a DataFrame for metrics
metrics_table = pd.DataFrame([val_metrics, test_metrics], index=['Validation', 'Test'])

# Create a separate table for interpretation
interpretation_table = pd.DataFrame({
    'Randomness': ['Random' if val_runs_pvalue > 0.05 else 'Not Random', 'Random' if test_runs_pvalue > 0.05 else 'Not Random'],
    'KS-Normality': ['Normal' if val_ks_pvalue > 0.05 else 'Not Normal', 'Normal' if test_ks_pvalue > 0.05 else 'Not Normal']
}, index=['Validation', 'Test'])

# Print the tables
print("Metrics Table:")
display(metrics_table.round(4))  # Use display() to render the DataFrame

print("\nTest Interpretation Table:")
display(interpretation_table.round(4))  # Use display() to render the DataFrame

# 6. Plot Training (last 48 observations), Validation, Test, and Forecasts
plt.figure(figsize=(14, 6))
plt.plot(train.iloc[-48:].index, train.iloc[-48:], label='Training (Last 48)', color='blue')
plt.plot(validation.index, validation, label='Validation', color='green')
plt.plot(test.index, test, label='Test', color='red')
plt.plot(val_forecast.index, val_forecast, label='Validation Forecast', color='orange', linestyle='--')
plt.plot(test_forecast.index, test_forecast, label='Test Forecast', color='purple', linestyle='--')
plt.title(f'RNN Forecast\nValidation MAE: {val_metrics["MAE"]:.2f}, Test MAE: {test_metrics["MAE"]:.2f}')
plt.xlabel('Data_Hora')
plt.ylabel('Temperature (°C)')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

# 7. Residual Analysis: Validation (2x2 Grid)
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# Quadrant 0,0: Residuals vs Time (Validation)
axes[0, 0].plot(val_residuals.index, val_residuals, label='Validation Residuals', color='blue')
axes[0, 0].axhline(0, color='black', linestyle='--')
#axes[0, 0].set_title('Validation Residuals vs Time')
axes[0, 0].set_title(f'Validation Residuals vs Time\nRandomness: {"Random" if val_runs_pvalue > 0.05 else "Not Random"} ({val_runs_pvalue:.4f})')
axes[0, 0].set_xlabel('Data_Hora')
axes[0, 0].set_ylabel('Residuals')
axes[0, 0].legend()
axes[0, 0].grid(True, alpha=0.3)

# Quadrant 0,1: Residuals Density and Histogram (Validation)
val_residuals.plot(kind='hist', bins=20, density=True, alpha=0.5, label='Validation Residuals Histogram', color='blue', ax=axes[0, 1])
val_residuals.plot(kind='kde', label='Validation Residuals Density', color='blue', linestyle='--', ax=axes[0, 1])
axes[0, 1].set_title('Validation Residuals Histogram and Density')
axes[0, 1].set_xlabel('Residuals')
axes[0, 1].set_ylabel('Density')
axes[0, 1].legend()
axes[0, 1].grid(True, alpha=0.3)

# Quadrant 1,0: Q-Q Plot (Validation)
stats.probplot(val_residuals, dist="norm", plot=axes[1, 0])
#axes[1, 0].set_title('Validation Residuals Q-Q Plot')
axes[1, 0].set_title(f'Validation Residuals Q-Q Plot\nKS-Normality: {"Normal" if val_ks_pvalue > 0.05 else "Not Normal"} ({val_ks_pvalue:.4f})')
axes[1, 0].grid(True, alpha=0.3)

# Quadrant 1,1: ACF (Validation)
plot_acf(val_residuals_cleaned, lags=min(20, len(val_residuals_cleaned) - 1), ax=axes[1, 1], title='ACF of Validation Residuals', color='blue')
axes[1, 1].grid(True, alpha=0.3)

# Quadrant 1,1: ACF (Validation)
# plot_acf(val_residuals, lags=20, ax=axes[1, 1], title='ACF of Validation Residuals', color='blue')
# axes[1, 1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# 8. Residual Analysis: Test (2x2 Grid)
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# Quadrant 0,0: Residuals vs Time (Test)
axes[0, 0].plot(test_residuals.index, test_residuals, label='Test Residuals', color='red')
axes[0, 0].axhline(0, color='black', linestyle='--')
#axes[0, 0].set_title('Test Residuals vs Time')
axes[0, 0].set_title(f'Test Residuals vs Time\nRandomness: {"Random" if test_runs_pvalue > 0.05 else "Not Random"} ({test_runs_pvalue:.4f})')
axes[0, 0].set_xlabel('Data_Hora')
axes[0, 0].set_ylabel('Residuals')
axes[0, 0].legend()
axes[0, 0].grid(True, alpha=0.3)

# Quadrant 0,1: Residuals Density and Histogram (Test)
test_residuals.plot(kind='hist', bins=20, density=True, alpha=0.5, label='Test Residuals Histogram', color='red', ax=axes[0, 1])
test_residuals.plot(kind='kde', label='Test Residuals Density', color='red', linestyle='--', ax=axes[0, 1])
axes[0, 1].set_title('Test Residuals Histogram and Density')
axes[0, 1].set_xlabel('Residuals')
axes[0, 1].set_ylabel('Density')
axes[0, 1].legend()
axes[0, 1].grid(True, alpha=0.3)

# Quadrant 1,0: Q-Q Plot (Test)
stats.probplot(test_residuals, dist="norm", plot=axes[1, 0])
#axes[1, 0].set_title('Test Residuals Q-Q Plot')
axes[1, 0].set_title(f'Test Residuals Q-Q Plot\nKS-Normality: {"Normal" if test_ks_pvalue > 0.05 else "Not Normal"} ({test_ks_pvalue:.4f})')
axes[1, 0].grid(True, alpha=0.3)

# Quadrant 1,1: ACF (Test)
plot_acf(test_residuals_cleaned, lags=min(20, len(val_residuals_cleaned) - 1), ax=axes[1, 1], title='ACF of Test Residuals', color='red')
axes[1, 1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Add the model: RNN
tempSeries = pd.Series(val_residuals)
ResTableVal.loc[5] = {"ModelName": "RNN", "ResValues": tempSeries.values}
tempSeries = pd.Series(test_residuals)
ResTableTest.loc[5] = {"ModelName": "RNN", "ResValues": tempSeries.values}
(24, 6)
(24, 6)
Epoch 1/10
5375/5375 ━━━━━━━━━━━━━━━━━━━━ 14s 2ms/step - loss: 0.0027 - val_loss: 5.2126e-04
Epoch 2/10
5375/5375 ━━━━━━━━━━━━━━━━━━━━ 21s 2ms/step - loss: 0.0010 - val_loss: 7.7277e-04
Epoch 3/10
5375/5375 ━━━━━━━━━━━━━━━━━━━━ 13s 2ms/step - loss: 0.0011 - val_loss: 5.4140e-04
Epoch 4/10
5375/5375 ━━━━━━━━━━━━━━━━━━━━ 14s 3ms/step - loss: 0.0010 - val_loss: 5.4864e-04
Epoch 5/10
5375/5375 ━━━━━━━━━━━━━━━━━━━━ 19s 2ms/step - loss: 9.9340e-04 - val_loss: 6.6526e-04
Epoch 6/10
5375/5375 ━━━━━━━━━━━━━━━━━━━━ 13s 2ms/step - loss: 9.9997e-04 - val_loss: 5.6443e-04
Epoch 7/10
5375/5375 ━━━━━━━━━━━━━━━━━━━━ 12s 2ms/step - loss: 0.0010 - val_loss: 5.8493e-04
Epoch 8/10
5375/5375 ━━━━━━━━━━━━━━━━━━━━ 13s 2ms/step - loss: 9.9621e-04 - val_loss: 5.9492e-04
Epoch 9/10
5375/5375 ━━━━━━━━━━━━━━━━━━━━ 13s 2ms/step - loss: 0.0010 - val_loss: 5.6386e-04
Epoch 10/10
5375/5375 ━━━━━━━━━━━━━━━━━━━━ 20s 2ms/step - loss: 9.7589e-04 - val_loss: 5.4842e-04
Metrics Table:
MAPE MAE RMSE MSE Ljung-Box p-value Randomness p-value KS-Normality p-value
Validation 2.9711 0.6226 0.7658 0.5864 0.4127 0.5312 0.6104
Test 2.8239 0.5768 0.7151 0.5113 0.1757 0.8347 0.2599
Test Interpretation Table:
Randomness KS-Normality
Validation Random Normal
Test Random Normal

13. Perceptrón Multi Capa (MLP)¶

A continuación el código Python para generar el modelo MLP

Este código implementa un modelo de Perceptrón Multicapa (MLP) para pronósticos en de la serie de tiempos referente a las mediciones hechas en la estación A601 del sudoeste de Brasil.

Divide los datos en conjuntos de entrenamiento, validación y prueba, y utiliza una ventana de tiempo (look_back) para predecir el siguiente valor.

El modelo se entrena con datos normalizados y se evalúa mediante métricas como MAPE, MAE, RMSE y MSE. Además, se realizan pruebas estadísticas (Ljung-Box, Wald-Wolfowitz y Kolmogorov-Smirnov) para analizar los residuos.

Se generan gráficos de los pronósticos, residuos y análisis de autocorrelación, y se almacenan los residuos en tablas para su posterior uso.

Los resultados de las métricas y la interpretación de los resultados de normalidad y aleatoriedad se presentan en tablas independientes.

In [64]:
# JDF-2025Mar09 - MLP

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats
from statsmodels.stats.diagnostic import acorr_ljungbox
from statsmodels.graphics.tsaplots import plot_acf
from sklearn.metrics import mean_absolute_error, mean_squared_error
from sklearn.preprocessing import MinMaxScaler
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

# 0. Set the original data in data_ts
data_ts = data_ori

# 1. Split the series into training, validation, and test sets
test_size = 24
val_size = 24

# Test: Last 24 values
test = data_ts.iloc[-test_size-6:]

# Validation: 24 values before test
validation = data_ts.iloc[-(test_size + val_size)-6:-test_size]

# Training: All other records
train = data_ts.iloc[:-(test_size + val_size)]

# 2. Prepare data for MLP
def create_dataset(data, look_back=24):
    """
    Create sequences of data for MLP input.

    Parameters:
        data (pd.Series): Input time series.
        look_back (int): Number of previous time steps to use as input.

    Returns:
        X, y: Input features and target values.
    """
    X, y = [], []
    for i in range(len(data) - look_back):
        X.append(data.iloc[i:i + look_back].values)
        y.append(data.iloc[i + look_back])
    return np.array(X), np.array(y)

# Normalize the data
scaler = MinMaxScaler(feature_range=(0, 1))
train_scaled = scaler.fit_transform(train.values.reshape(-1, 1))
validation_scaled = scaler.transform(validation.values.reshape(-1, 1))
test_scaled = scaler.transform(test.values.reshape(-1, 1))

# Create datasets
look_back = 6  # Use the last 24 hours to predict the next hour
X_train, y_train = create_dataset(pd.Series(train_scaled.flatten()), look_back)
X_val, y_val = create_dataset(pd.Series(validation_scaled.flatten()), look_back)
X_test, y_test = create_dataset(pd.Series(test_scaled.flatten()), look_back)

# Reshape input to be [samples, features] for MLP
X_train = X_train.reshape(X_train.shape[0], X_train.shape[1])
X_val = X_val.reshape(X_val.shape[0], X_val.shape[1])
X_test = X_test.reshape(X_test.shape[0], X_test.shape[1])

# 3. Build the MLP model
def build_mlp_model(input_shape):
    """
    Build an MLP model.

    Parameters:
        input_shape (int): Number of input features.

    Returns:
        model: Compiled MLP model.
    """
    model = Sequential()
    model.add(Dense(50, input_dim=input_shape, activation='relu'))  # First hidden layer
    model.add(Dense(25, activation='relu'))  # Second hidden layer
    model.add(Dense(1))  # Output layer
    model.compile(optimizer='adam', loss='mse')
    return model

# Build and train the model
model = build_mlp_model(X_train.shape[1])
model.fit(X_train, y_train, epochs=10, batch_size=32, validation_data=(X_val, y_val), verbose=1)

# 4. Forecast for validation and test
def forecast_mlp(model, data, look_back):
    """
    Forecast future values using the MLP model.

    Parameters:
        model: Trained MLP model.
        data (np.array): Input data.
        look_back (int): Number of previous time steps to use as input.

    Returns:
        forecast: Forecasted values.
    """
    forecast = []
    for i in range(len(data) - look_back):
        X = data[i:i + look_back].reshape(1, look_back)
        y_pred = model.predict(X, verbose=0)
        forecast.append(y_pred[0, 0])
    return np.array(forecast)

# Forecast for validation
val_forecast_scaled = forecast_mlp(model, validation_scaled, look_back)
val_forecast = scaler.inverse_transform(val_forecast_scaled.reshape(-1, 1)).flatten()
val_forecast = pd.Series(val_forecast, index=validation.index[look_back:])

# Forecast for test
test_forecast_scaled = forecast_mlp(model, test_scaled, look_back)
test_forecast = scaler.inverse_transform(test_forecast_scaled.reshape(-1, 1)).flatten()
test_forecast = pd.Series(test_forecast, index=test.index[look_back:])

# 5. Calculate metrics
def calculate_metrics(actual, forecast):
    """
    Calculate MAPE, MAE, RMSE, MSE, and Ljung-Box p-value.
    """
    actual = actual.replace(0, 1e-9)  # Avoid division by zero
    metrics = {
        'MAPE': np.mean(np.abs((actual - forecast) / actual)) * 100,
        'MAE': mean_absolute_error(actual, forecast),
        'RMSE': np.sqrt(mean_squared_error(actual, forecast)),
        'MSE': mean_squared_error(actual, forecast),
    }
    return metrics

# Validation metrics
val_metrics = calculate_metrics(validation.iloc[look_back:], val_forecast)
val_residuals = validation.iloc[look_back:] - val_forecast

# Drop NaN values from residuals before Ljung-Box test
val_residuals_cleaned = val_residuals.dropna()
if len(val_residuals_cleaned) > 0:
    val_metrics['Ljung-Box p-value'] = acorr_ljungbox(val_residuals_cleaned, lags=10, return_df=True)['lb_pvalue'].values[0]
else:
    val_metrics['Ljung-Box p-value'] = np.nan  # Handle case where all residuals are NaN

# Test metrics
test_metrics = calculate_metrics(test.iloc[look_back:], test_forecast)
test_residuals = test.iloc[look_back:] - test_forecast

# Drop NaN values from residuals before Ljung-Box test
test_residuals_cleaned = test_residuals.dropna()
if len(test_residuals_cleaned) > 0:
    test_metrics['Ljung-Box p-value'] = acorr_ljungbox(test_residuals_cleaned, lags=10, return_df=True)['lb_pvalue'].values[0]
else:
    test_metrics['Ljung-Box p-value'] = np.nan  # Handle case where all residuals are NaN

# Perform Wald-Wolfowitz Randomness Test
val_runs_stat, val_runs_pvalue = runstest_1samp(val_residuals)
test_runs_stat, test_runs_pvalue = runstest_1samp(test_residuals)

# Add randomness test results to metrics
val_metrics['Randomness p-value'] = val_runs_pvalue
test_metrics['Randomness p-value'] = test_runs_pvalue

# Perform Kolmogorov-Smirnov Normality Test
val_ks_stat, val_ks_pvalue = stats.kstest(val_residuals, 'norm')
test_ks_stat, test_ks_pvalue = stats.kstest(test_residuals, 'norm')

# Add normality test results to metrics
val_metrics['KS-Normality p-value'] = val_ks_pvalue
test_metrics['KS-Normality p-value'] = test_ks_pvalue

# Create a DataFrame for metrics
metrics_table = pd.DataFrame([val_metrics, test_metrics], index=['Validation', 'Test'])

# Create a separate table for interpretation
interpretation_table = pd.DataFrame({
    'Randomness': ['Random' if val_runs_pvalue > 0.05 else 'Not Random', 'Random' if test_runs_pvalue > 0.05 else 'Not Random'],
    'KS-Normality': ['Normal' if val_ks_pvalue > 0.05 else 'Not Normal', 'Normal' if test_ks_pvalue > 0.05 else 'Not Normal']
}, index=['Validation', 'Test'])

# Print the tables
print("Metrics Table:")
display(metrics_table.round(4))  # Use display() to render the DataFrame

print("\nTest Interpretation Table:")
display(interpretation_table.round(4))  # Use display() to render the DataFrame

# 6. Plot Training (last 48 observations), Validation, Test, and Forecasts
plt.figure(figsize=(14, 6))
plt.plot(train.iloc[-48:].index, train.iloc[-48:], label='Training (Last 48)', color='blue')
plt.plot(validation.index, validation, label='Validation', color='green')
plt.plot(test.index, test, label='Test', color='red')
plt.plot(val_forecast.index, val_forecast, label='Validation Forecast', color='orange', linestyle='--')
plt.plot(test_forecast.index, test_forecast, label='Test Forecast', color='purple', linestyle='--')
plt.title(f'MLP Forecast\nValidation MAE: {val_metrics["MAE"]:.2f}, Test MAE: {test_metrics["MAE"]:.2f}')
plt.xlabel('Data_Hora')
plt.ylabel('Temperature (°C)')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

# 7. Residual Analysis: Validation (2x2 Grid)
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# Quadrant 0,0: Residuals vs Time (Validation)
axes[0, 0].plot(val_residuals.index, val_residuals, label='Validation Residuals', color='blue')
axes[0, 0].axhline(0, color='black', linestyle='--')
#axes[0, 0].set_title('Validation Residuals vs Time')
axes[0, 0].set_title(f'Validation Residuals vs Time\nRandomness: {"Random" if val_runs_pvalue > 0.05 else "Not Random"} ({val_runs_pvalue:.4f})')
axes[0, 0].set_xlabel('Data_Hora')
axes[0, 0].set_ylabel('Residuals')
axes[0, 0].legend()
axes[0, 0].grid(True, alpha=0.3)

# Quadrant 0,1: Residuals Density and Histogram (Validation)
val_residuals.plot(kind='hist', bins=20, density=True, alpha=0.5, label='Validation Residuals Histogram', color='blue', ax=axes[0, 1])
val_residuals.plot(kind='kde', label='Validation Residuals Density', color='blue', linestyle='--', ax=axes[0, 1])
axes[0, 1].set_title('Validation Residuals Histogram and Density')
axes[0, 1].set_xlabel('Residuals')
axes[0, 1].set_ylabel('Density')
axes[0, 1].legend()
axes[0, 1].grid(True, alpha=0.3)

# Quadrant 1,0: Q-Q Plot (Validation)
stats.probplot(val_residuals, dist="norm", plot=axes[1, 0])
#axes[1, 0].set_title('Validation Residuals Q-Q Plot')
axes[1, 0].set_title(f'Validation Residuals Q-Q Plot\nKS-Normality: {"Normal" if val_ks_pvalue > 0.05 else "Not Normal"} ({val_ks_pvalue:.4f})')
axes[1, 0].grid(True, alpha=0.3)

# Quadrant 1,1: ACF (Validation)
plot_acf(val_residuals_cleaned, lags=min(20, len(val_residuals_cleaned) - 1), ax=axes[1, 1], title='ACF of Validation Residuals', color='blue')
axes[1, 1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# 8. Residual Analysis: Test (2x2 Grid)
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# Quadrant 0,0: Residuals vs Time (Test)
axes[0, 0].plot(test_residuals.index, test_residuals, label='Test Residuals', color='red')
axes[0, 0].axhline(0, color='black', linestyle='--')
#axes[0, 0].set_title('Test Residuals vs Time')
axes[0, 0].set_title(f'Test Residuals vs Time\nRandomness: {"Random" if test_runs_pvalue > 0.05 else "Not Random"} ({test_runs_pvalue:.4f})')
axes[0, 0].set_xlabel('Data_Hora')
axes[0, 0].set_ylabel('Residuals')
axes[0, 0].legend()
axes[0, 0].grid(True, alpha=0.3)

# Quadrant 0,1: Residuals Density and Histogram (Test)
test_residuals.plot(kind='hist', bins=20, density=True, alpha=0.5, label='Test Residuals Histogram', color='red', ax=axes[0, 1])
test_residuals.plot(kind='kde', label='Test Residuals Density', color='red', linestyle='--', ax=axes[0, 1])
axes[0, 1].set_title('Test Residuals Histogram and Density')
axes[0, 1].set_xlabel('Residuals')
axes[0, 1].set_ylabel('Density')
axes[0, 1].legend()
axes[0, 1].grid(True, alpha=0.3)

# Quadrant 1,0: Q-Q Plot (Test)
stats.probplot(test_residuals, dist="norm", plot=axes[1, 0])
#axes[1, 0].set_title('Test Residuals Q-Q Plot')
axes[1, 0].set_title(f'Test Residuals Q-Q Plot\nKS-Normality: {"Normal" if test_ks_pvalue > 0.05 else "Not Normal"} ({test_ks_pvalue:.4f})')
axes[1, 0].grid(True, alpha=0.3)

# Quadrant 1,1: ACF (Test)
plot_acf(test_residuals_cleaned, lags=min(20, len(test_residuals_cleaned) - 1), ax=axes[1, 1], title='ACF of Test Residuals', color='red')
axes[1, 1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Add the model: MLP
tempSeries = pd.Series(val_residuals)
ResTableVal.loc[6] = {"ModelName": "MLP", "ResValues": tempSeries.values}
tempSeries = pd.Series(test_residuals)
ResTableTest.loc[6] = {"ModelName": "MLP", "ResValues": tempSeries.values}
Epoch 1/10
5375/5375 ━━━━━━━━━━━━━━━━━━━━ 9s 2ms/step - loss: 0.0077 - val_loss: 5.1756e-04
Epoch 2/10
5375/5375 ━━━━━━━━━━━━━━━━━━━━ 10s 2ms/step - loss: 0.0011 - val_loss: 5.7394e-04
Epoch 3/10
5375/5375 ━━━━━━━━━━━━━━━━━━━━ 7s 1ms/step - loss: 0.0010 - val_loss: 7.4281e-04
Epoch 4/10
5375/5375 ━━━━━━━━━━━━━━━━━━━━ 11s 1ms/step - loss: 0.0010 - val_loss: 5.4199e-04
Epoch 5/10
5375/5375 ━━━━━━━━━━━━━━━━━━━━ 8s 1ms/step - loss: 0.0010 - val_loss: 6.0700e-04
Epoch 6/10
5375/5375 ━━━━━━━━━━━━━━━━━━━━ 10s 1ms/step - loss: 0.0010 - val_loss: 5.6668e-04
Epoch 7/10
5375/5375 ━━━━━━━━━━━━━━━━━━━━ 8s 1ms/step - loss: 0.0010 - val_loss: 5.5554e-04
Epoch 8/10
5375/5375 ━━━━━━━━━━━━━━━━━━━━ 11s 2ms/step - loss: 0.0010 - val_loss: 5.7778e-04
Epoch 9/10
5375/5375 ━━━━━━━━━━━━━━━━━━━━ 8s 1ms/step - loss: 0.0010 - val_loss: 6.5952e-04
Epoch 10/10
5375/5375 ━━━━━━━━━━━━━━━━━━━━ 8s 1ms/step - loss: 0.0010 - val_loss: 7.1140e-04
Metrics Table:
MAPE MAE RMSE MSE Ljung-Box p-value Randomness p-value KS-Normality p-value
Validation 3.3882 0.7041 0.8722 0.7607 0.6299 0.5312 0.0298
Test 3.1002 0.6255 0.7725 0.5967 0.2098 0.8347 0.0318
Test Interpretation Table:
Randomness KS-Normality
Validation Random Not Normal
Test Random Not Normal

14. Modelo de memoria a largo-corto plazao (LSTM)¶

A continuación el código Python para generar el modelo LSTM

Este código implementa un modelo de Long Short-Term Memory (LSTM) para pronósticos referente a las mediciones hechas en la estación A601 del sudoeste de Brasil.

Divide los datos en conjuntos de entrenamiento, validación y prueba, y utiliza una ventana de tiempo (look_back) para predecir el siguiente valor.

El modelo se entrena con datos normalizados y se evalúa mediante métricas como MAPE, MAE, RMSE y MSE. Además, se realizan pruebas estadísticas (Ljung-Box, Wald-Wolfowitz y Kolmogorov-Smirnov) para analizar los residuos.

Finalmente, se generan gráficos de los pronósticos, residuos y análisis de autocorrelación, y se almacenan los residuos en tablas para su posterior uso.

Los resultados de las métricas y la interpretación de los resultados de normalidad y aleatoriedad se presentan en tablas independientes.

In [65]:
# JDF-2025Mar09 - LSTM
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats
from statsmodels.stats.diagnostic import acorr_ljungbox
from statsmodels.graphics.tsaplots import plot_acf
from sklearn.metrics import mean_absolute_error, mean_squared_error
from sklearn.preprocessing import MinMaxScaler
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense

# 0. Set the original data in data_ts
data_ts = data_ori

# 1. Split the series into training, validation, and test sets
test_size = 24
val_size = 24

# Test: Last 24 values
test = data_ts.iloc[-test_size-6:]

# Validation: 24 values before test
validation = data_ts.iloc[-(test_size + val_size)-6:-test_size]

# Training: All other records
train = data_ts.iloc[:-(test_size + val_size)]

# 2. Prepare data for LSTM
def create_dataset(data, look_back=24):
    """
    Create sequences of data for LSTM input.

    Parameters:
        data (pd.Series): Input time series.
        look_back (int): Number of previous time steps to use as input.

    Returns:
        X, y: Input features and target values.
    """
    X, y = [], []
    for i in range(len(data) - look_back):
        X.append(data.iloc[i:i + look_back].values)
        y.append(data.iloc[i + look_back])
    return np.array(X), np.array(y)

# Normalize the data
scaler = MinMaxScaler(feature_range=(0, 1))
train_scaled = scaler.fit_transform(train.values.reshape(-1, 1))
validation_scaled = scaler.transform(validation.values.reshape(-1, 1))
test_scaled = scaler.transform(test.values.reshape(-1, 1))

# Create datasets
look_back = 6  # Use the last 24 hours to predict the next hour
X_train, y_train = create_dataset(pd.Series(train_scaled.flatten()), look_back)
X_val, y_val = create_dataset(pd.Series(validation_scaled.flatten()), look_back)
X_test, y_test = create_dataset(pd.Series(test_scaled.flatten()), look_back)

# Reshape input to be [samples, time steps, features] for LSTM
X_train = X_train.reshape(X_train.shape[0], X_train.shape[1], 1)
X_val = X_val.reshape(X_val.shape[0], X_val.shape[1], 1)
X_test = X_test.reshape(X_test.shape[0], X_test.shape[1], 1)

# 3. Build the LSTM model
def build_lstm_model(input_shape):
    """
    Build an LSTM model.

    Parameters:
        input_shape (tuple): Shape of the input data (time steps, features).

    Returns:
        model: Compiled LSTM model.
    """
    model = Sequential()
    model.add(LSTM(50, input_shape=input_shape, activation='relu'))  # LSTM layer
    model.add(Dense(1))  # Output layer
    model.compile(optimizer='adam', loss='mse')
    return model

# Build and train the model
model = build_lstm_model((X_train.shape[1], X_train.shape[2]))
model.fit(X_train, y_train, epochs=10, batch_size=32, validation_data=(X_val, y_val), verbose=1)

# 4. Forecast for validation and test
def forecast_lstm(model, data, look_back):
    """
    Forecast future values using the LSTM model.

    Parameters:
        model: Trained LSTM model.
        data (np.array): Input data.
        look_back (int): Number of previous time steps to use as input.

    Returns:
        forecast: Forecasted values.
    """
    forecast = []
    for i in range(len(data) - look_back):
        X = data[i:i + look_back].reshape(1, look_back, 1)
        y_pred = model.predict(X, verbose=0)
        forecast.append(y_pred[0, 0])
    return np.array(forecast)

# Forecast for validation
val_forecast_scaled = forecast_lstm(model, validation_scaled, look_back)
val_forecast = scaler.inverse_transform(val_forecast_scaled.reshape(-1, 1)).flatten()
val_forecast = pd.Series(val_forecast, index=validation.index[look_back:])

# Forecast for test
test_forecast_scaled = forecast_lstm(model, test_scaled, look_back)
test_forecast = scaler.inverse_transform(test_forecast_scaled.reshape(-1, 1)).flatten()
test_forecast = pd.Series(test_forecast, index=test.index[look_back:])

# 5. Calculate metrics
def calculate_metrics(actual, forecast):
    """
    Calculate MAPE, MAE, RMSE, MSE, and Ljung-Box p-value.
    """
    actual = actual.replace(0, 1e-9)  # Avoid division by zero
    metrics = {
        'MAPE': np.mean(np.abs((actual - forecast) / actual)) * 100,
        'MAE': mean_absolute_error(actual, forecast),
        'RMSE': np.sqrt(mean_squared_error(actual, forecast)),
        'MSE': mean_squared_error(actual, forecast),
    }
    return metrics

# Validation metrics
val_metrics = calculate_metrics(validation.iloc[look_back:], val_forecast)
val_residuals = validation.iloc[look_back:] - val_forecast

# Drop NaN values from residuals before Ljung-Box test
val_residuals_cleaned = val_residuals.dropna()
if len(val_residuals_cleaned) > 0:
    val_metrics['Ljung-Box p-value'] = acorr_ljungbox(val_residuals_cleaned, lags=10, return_df=True)['lb_pvalue'].values[0]
else:
    val_metrics['Ljung-Box p-value'] = np.nan  # Handle case where all residuals are NaN

# Test metrics
test_metrics = calculate_metrics(test.iloc[look_back:], test_forecast)
test_residuals = test.iloc[look_back:] - test_forecast

# Drop NaN values from residuals before Ljung-Box test
test_residuals_cleaned = test_residuals.dropna()
if len(test_residuals_cleaned) > 0:
    test_metrics['Ljung-Box p-value'] = acorr_ljungbox(test_residuals_cleaned, lags=10, return_df=True)['lb_pvalue'].values[0]
else:
    test_metrics['Ljung-Box p-value'] = np.nan  # Handle case where all residuals are NaN

# Perform Wald-Wolfowitz Randomness Test
val_runs_stat, val_runs_pvalue = runstest_1samp(val_residuals)
test_runs_stat, test_runs_pvalue = runstest_1samp(test_residuals)

# Add randomness test results to metrics
val_metrics['Randomness p-value'] = val_runs_pvalue
test_metrics['Randomness p-value'] = test_runs_pvalue

# Perform Kolmogorov-Smirnov Normality Test
val_ks_stat, val_ks_pvalue = stats.kstest(val_residuals, 'norm')
test_ks_stat, test_ks_pvalue = stats.kstest(test_residuals, 'norm')

# Add normality test results to metrics
val_metrics['KS-Normality p-value'] = val_ks_pvalue
test_metrics['KS-Normality p-value'] = test_ks_pvalue

# Create a DataFrame for metrics
metrics_table = pd.DataFrame([val_metrics, test_metrics], index=['Validation', 'Test'])

# Create a separate table for interpretation
interpretation_table = pd.DataFrame({
    'Randomness': ['Random' if val_runs_pvalue > 0.05 else 'Not Random', 'Random' if test_runs_pvalue > 0.05 else 'Not Random'],
    'KS-Normality': ['Normal' if val_ks_pvalue > 0.05 else 'Not Normal', 'Normal' if test_ks_pvalue > 0.05 else 'Not Normal']
}, index=['Validation', 'Test'])

# Print the tables
print("Metrics Table:")
display(metrics_table.round(4))  # Use display() to render the DataFrame

print("\nTest Interpretation Table:")
display(interpretation_table.round(4))  # Use display() to render the DataFrame

# 6. Plot Training (last 48 observations), Validation, Test, and Forecasts
plt.figure(figsize=(14, 6))
plt.plot(train.iloc[-48:].index, train.iloc[-48:], label='Training (Last 48)', color='blue')
plt.plot(validation.index, validation, label='Validation', color='green')
plt.plot(test.index, test, label='Test', color='red')
plt.plot(val_forecast.index, val_forecast, label='Validation Forecast', color='orange', linestyle='--')
plt.plot(test_forecast.index, test_forecast, label='Test Forecast', color='purple', linestyle='--')
plt.title(f'LSTM Forecast\nValidation MAE: {val_metrics["MAE"]:.2f}, Test MAE: {test_metrics["MAE"]:.2f}')
plt.xlabel('Data_Hora')
plt.ylabel('Temperature (°C)')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

# 7. Residual Analysis: Validation (2x2 Grid)
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# Quadrant 0,0: Residuals vs Time (Validation)
axes[0, 0].plot(val_residuals.index, val_residuals, label='Validation Residuals', color='blue')
axes[0, 0].axhline(0, color='black', linestyle='--')
#axes[0, 0].set_title('Validation Residuals vs Time')
axes[0, 0].set_title(f'Validation Residuals vs Time\nRandomness: {"Random" if val_runs_pvalue > 0.05 else "Not Random"} ({val_runs_pvalue:.4f})')
axes[0, 0].set_xlabel('Data_Hora')
axes[0, 0].set_ylabel('Residuals')
axes[0, 0].legend()
axes[0, 0].grid(True, alpha=0.3)

# Quadrant 0,1: Residuals Density and Histogram (Validation)
val_residuals.plot(kind='hist', bins=20, density=True, alpha=0.5, label='Validation Residuals Histogram', color='blue', ax=axes[0, 1])
val_residuals.plot(kind='kde', label='Validation Residuals Density', color='blue', linestyle='--', ax=axes[0, 1])
axes[0, 1].set_title('Validation Residuals Histogram and Density')
axes[0, 1].set_xlabel('Residuals')
axes[0, 1].set_ylabel('Density')
axes[0, 1].legend()
axes[0, 1].grid(True, alpha=0.3)

# Quadrant 1,0: Q-Q Plot (Validation)
stats.probplot(val_residuals, dist="norm", plot=axes[1, 0])
#axes[1, 0].set_title('Validation Residuals Q-Q Plot')
axes[1, 0].set_title(f'Validation Residuals Q-Q Plot\nKS-Normality: {"Normal" if val_ks_pvalue > 0.05 else "Not Normal"} ({val_ks_pvalue:.4f})')
axes[1, 0].grid(True, alpha=0.3)

# Quadrant 1,1: ACF (Validation)
plot_acf(val_residuals_cleaned, lags=min(20, len(val_residuals_cleaned) - 1), ax=axes[1, 1], title='ACF of Validation Residuals', color='blue')
axes[1, 1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# 8. Residual Analysis: Test (2x2 Grid)
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# Quadrant 0,0: Residuals vs Time (Test)
axes[0, 0].plot(test_residuals.index, test_residuals, label='Test Residuals', color='red')
axes[0, 0].axhline(0, color='black', linestyle='--')
#axes[0, 0].set_title('Test Residuals vs Time')
axes[0, 0].set_title(f'Test Residuals vs Time\nRandomness: {"Random" if test_runs_pvalue > 0.05 else "Not Random"} ({test_runs_pvalue:.4f})')
axes[0, 0].set_xlabel('Data_Hora')
axes[0, 0].set_ylabel('Residuals')
axes[0, 0].legend()
axes[0, 0].grid(True, alpha=0.3)

# Quadrant 0,1: Residuals Density and Histogram (Test)
test_residuals.plot(kind='hist', bins=20, density=True, alpha=0.5, label='Test Residuals Histogram', color='red', ax=axes[0, 1])
test_residuals.plot(kind='kde', label='Test Residuals Density', color='red', linestyle='--', ax=axes[0, 1])
axes[0, 1].set_title('Test Residuals Histogram and Density')
axes[0, 1].set_xlabel('Residuals')
axes[0, 1].set_ylabel('Density')
axes[0, 1].legend()
axes[0, 1].grid(True, alpha=0.3)

# Quadrant 1,0: Q-Q Plot (Test)
stats.probplot(test_residuals, dist="norm", plot=axes[1, 0])
#axes[1, 0].set_title('Test Residuals Q-Q Plot')
axes[1, 0].set_title(f'Test Residuals Q-Q Plot\nKS-Normality: {"Normal" if test_ks_pvalue > 0.05 else "Not Normal"} ({test_ks_pvalue:.4f})')
axes[1, 0].grid(True, alpha=0.3)

# Quadrant 1,1: ACF (Test)
plot_acf(test_residuals_cleaned, lags=min(20, len(test_residuals_cleaned) - 1), ax=axes[1, 1], title='ACF of Test Residuals', color='red')
axes[1, 1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Add the model: LSTM
tempSeries = pd.Series(val_residuals)
ResTableVal.loc[7] = {"ModelName": "LSTM", "ResValues": tempSeries.values}
tempSeries = pd.Series(test_residuals)
ResTableTest.loc[7] = {"ModelName": "LSTM", "ResValues": tempSeries.values}
Epoch 1/10
5375/5375 ━━━━━━━━━━━━━━━━━━━━ 22s 4ms/step - loss: 0.0093 - val_loss: 6.4560e-04
Epoch 2/10
5375/5375 ━━━━━━━━━━━━━━━━━━━━ 20s 4ms/step - loss: 0.0011 - val_loss: 6.1571e-04
Epoch 3/10
5375/5375 ━━━━━━━━━━━━━━━━━━━━ 20s 4ms/step - loss: 0.0011 - val_loss: 5.4599e-04
Epoch 4/10
5375/5375 ━━━━━━━━━━━━━━━━━━━━ 21s 4ms/step - loss: 0.0010 - val_loss: 5.8509e-04
Epoch 5/10
5375/5375 ━━━━━━━━━━━━━━━━━━━━ 42s 4ms/step - loss: 0.0010 - val_loss: 6.0400e-04
Epoch 6/10
5375/5375 ━━━━━━━━━━━━━━━━━━━━ 42s 4ms/step - loss: 0.0010 - val_loss: 7.4651e-04
Epoch 7/10
5375/5375 ━━━━━━━━━━━━━━━━━━━━ 39s 4ms/step - loss: 0.0010 - val_loss: 8.3446e-04
Epoch 8/10
5375/5375 ━━━━━━━━━━━━━━━━━━━━ 44s 4ms/step - loss: 9.9106e-04 - val_loss: 6.4935e-04
Epoch 9/10
5375/5375 ━━━━━━━━━━━━━━━━━━━━ 23s 4ms/step - loss: 9.9718e-04 - val_loss: 6.1355e-04
Epoch 10/10
5375/5375 ━━━━━━━━━━━━━━━━━━━━ 22s 4ms/step - loss: 9.7365e-04 - val_loss: 5.7034e-04
Metrics Table:
MAPE MAE RMSE MSE Ljung-Box p-value Randomness p-value KS-Normality p-value
Validation 3.0971 0.6505 0.7809 0.6099 0.2904 0.5312 0.5811
Test 2.7527 0.5637 0.6992 0.4888 0.5200 0.8347 0.4878
Test Interpretation Table:
Randomness KS-Normality
Validation Random Normal
Test Random Normal

15. Modelo Autorregresivo Integrado Media Móvil (ARIMA)¶

El código implementa un modelo ARIMA para predecir una serie de tiempo con datos meteorológicos obtenidos en la estación A601 del sudoeste de Brasil, dividiendo los datos en conjuntos de entrenamiento, validación y prueba.

Preparación de datos:

    La serie temporal se divide en tres conjuntos: entrenamiento, validación y prueba.

    Se ajusta la frecuencia de la serie temporal a horas ('H').

Modelado ARIMA:

    Se define y ajusta un modelo ARIMA con parámetros (p, d, q) = (1, 1, 1) a los datos de entrenamiento.

    Se realizan pronósticos para los conjuntos de validación y prueba.

Evaluación del modelo:

    Se calculan métricas de error como MAPE, MAE, RMSE y MSE para evaluar la precisión de los pronósticos.

    Se realizan pruebas estadísticas (Ljung-Box, Wald-Wolfowitz y Kolmogorov-Smirnov) para analizar los residuos y verificar su aleatoriedad y normalidad.

Visualización:

    Se grafican los datos de entrenamiento, validación, prueba y los pronósticos.

    Se analizan y visualizan los residuos mediante histogramas, gráficos de densidad, Q-Q plots y ACF (Autocorrelación).

Resultados:

    Se presentan tablas con las métricas de evaluación y la interpretación de las pruebas estadísticas.

En resumen, el código implementa un modelo ARIMA para predecir la temperatura en las próximas 24 horas basados en la serie de tiempo dada, evalúa su rendimiento mediante métricas y pruebas estadísticas, y visualiza los resultados y residuos para su análisis.

Los resultados de las métricas y la interpretación de los resultados de normalidad y aleatoriedad se presentan en tablas independientes.

In [66]:
# JDF-2025Mar12 - ARIMA

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats
from statsmodels.stats.diagnostic import acorr_ljungbox
from statsmodels.tsa.arima.model import ARIMA
from statsmodels.graphics.tsaplots import plot_acf
from sklearn.metrics import mean_absolute_error, mean_squared_error

# 0. Set the original data in data_ts
data_ts = data_ori

# 1. Split the series into training, validation, and test sets
test_size = 24
val_size = 24

# Set the frequency of the datetime index
data_ts = data_ts.asfreq('H')

# Test: Last 24 values
test = data_ts.iloc[-test_size:]

# Validation: 24 values before test
validation = data_ts.iloc[-(test_size + val_size):-test_size]

# Training: All other records
train = data_ts.iloc[:-(test_size + val_size)]

# 2. Build and fit the ARIMA model
def fit_arima_model(data, order):
    """
    Fit an ARIMA model to the data.

    Parameters:
        data (pd.Series): Input time series.
        order (tuple): (p, d, q) parameters of the ARIMA model.

    Returns:
        model: Fitted ARIMA model.
    """
    model = ARIMA(data, order=order)
    return model.fit()

# Fit ARIMA model to training data
arima_order = (1, 1, 1)  # Adjust the (p, d, q) order as needed
arima_model = fit_arima_model(train, arima_order)

# 3. Forecast
def forecast_arima(model, steps):
    """
    Forecast future values using the ARIMA model.

    Parameters:
        model: Fitted ARIMA model.
        steps (int): Number of steps to forecast.

    Returns:
        forecast: Forecasted values as a Pandas Series.
    """
    forecast = model.forecast(steps=steps)
    return forecast

# Forecast for validation and test
val_forecast = forecast_arima(arima_model, len(validation))
combined_train = pd.concat([train, validation])
arima_model = fit_arima_model(combined_train, arima_order)
#test_forecast = forecast_arima(arima_model, len(test))
test_forecast = forecast_arima(arima_model, len(test))

# 4. Calculate metrics
def calculate_metrics(actual, forecast):
    """
    Calculate MAPE, MAE, RMSE, MSE, and Ljung-Box p-value.
    """
    actual = actual.replace(0, 1e-9)  # Avoid division by zero
    metrics = {
        'MAPE': np.mean(np.abs((actual - forecast) / actual)) * 100,
        'MAE': mean_absolute_error(actual, forecast),
        'RMSE': np.sqrt(mean_squared_error(actual, forecast)),
        'MSE': mean_squared_error(actual, forecast),
    }
    return metrics

# Validation metrics
val_metrics = calculate_metrics(validation, val_forecast)
val_residuals = validation - val_forecast

# Drop NaN values from residuals before Ljung-Box test
val_residuals_cleaned = val_residuals.dropna()
if len(val_residuals_cleaned) > 0:
    val_metrics['Ljung-Box p-value'] = acorr_ljungbox(val_residuals_cleaned, lags=10, return_df=True)['lb_pvalue'].values[0]
else:
    val_metrics['Ljung-Box p-value'] = np.nan  # Handle case where all residuals are NaN

# Test metrics
test_metrics = calculate_metrics(test, test_forecast)
test_residuals = test - test_forecast

# Drop NaN values from residuals before Ljung-Box test
test_residuals_cleaned = test_residuals.dropna()
if len(test_residuals_cleaned) > 0:
    test_metrics['Ljung-Box p-value'] = acorr_ljungbox(test_residuals_cleaned, lags=10, return_df=True)['lb_pvalue'].values[0]
else:
    test_metrics['Ljung-Box p-value'] = np.nan  # Handle case where all residuals are NaN

# Perform Wald-Wolfowitz Randomness Test
val_runs_stat, val_runs_pvalue = runstest_1samp(val_residuals)
test_runs_stat, test_runs_pvalue = runstest_1samp(test_residuals)

# Add randomness test results to metrics
val_metrics['Randomness p-value'] = val_runs_pvalue
test_metrics['Randomness p-value'] = test_runs_pvalue

# Perform Kolmogorov-Smirnov Normality Test
val_ks_stat, val_ks_pvalue = stats.kstest(val_residuals, 'norm')
test_ks_stat, test_ks_pvalue = stats.kstest(test_residuals, 'norm')

# Add normality test results to metrics
val_metrics['KS-Normality p-value'] = val_ks_pvalue
test_metrics['KS-Normality p-value'] = test_ks_pvalue

# Create a DataFrame for metrics
metrics_table = pd.DataFrame([val_metrics, test_metrics], index=['Validation', 'Test'])

# Create a separate table for interpretation
interpretation_table = pd.DataFrame({
    'Randomness': ['Random' if val_runs_pvalue > 0.05 else 'Not Random', 'Random' if test_runs_pvalue > 0.05 else 'Not Random'],
    'KS-Normality': ['Normal' if val_ks_pvalue > 0.05 else 'Not Normal', 'Normal' if test_ks_pvalue > 0.05 else 'Not Normal']
}, index=['Validation', 'Test'])

# Print the tables
print("Metrics Table:")
display(metrics_table.round(4))  # Use display() to render the DataFrame

print("\nTest Interpretation Table:")
display(interpretation_table.round(4))  # Use display() to render the DataFrame

# 5. Plot Training (last 48 observations), Validation, Test, and Forecasts
plt.figure(figsize=(14, 6))
plt.plot(train.iloc[-48:].index, train.iloc[-48:], label='Training (Last 48)', color='blue')
plt.plot(validation.index, validation, label='Validation', color='green')
plt.plot(test.index, test, label='Test', color='red')
plt.plot(validation.index, val_forecast, label='Validation Forecast', color='orange', linestyle='--')
plt.plot(test.index, test_forecast, label='Test Forecast', color='purple', linestyle='--')
plt.title(f'ARIMA Forecast\nValidation MAE: {val_metrics["MAE"]:.2f}, Test MAE: {test_metrics["MAE"]:.2f}')
plt.xlabel('Data_Hora')
plt.ylabel('Temperature (°C)')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

# 6. Residual Analysis: Validation
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

axes[0, 0].plot(val_residuals.index, val_residuals, label='Validation Residuals', color='blue')
axes[0, 0].axhline(0, color='black', linestyle='--')
#axes[0, 0].set_title('Validation Residuals vs Time')
axes[0, 0].set_title(f'Validation Residuals vs Time\nRandomness: {"Random" if val_runs_pvalue > 0.05 else "Not Random"} ({val_runs_pvalue:.4f})')
axes[0, 0].set_xlabel('Data_Hora')
axes[0, 0].set_ylabel('Residuals')
axes[0, 0].grid(True, alpha=0.3)

val_residuals.plot(kind='hist', bins=20, density=True, alpha=0.5, label='Validation Residuals Histogram', ax=axes[0, 1])
if val_residuals.notna().sum() > 1:  # Ensure there are at least two valid residuals
    val_residuals.plot(kind='kde', label='Validation Residuals Density', ax=axes[0, 1])
    axes[0, 1].set_title('Validation Residuals Histogram and Density')
    axes[0, 1].legend()
else:
    print("Not enough data points for KDE plot.")

# val_residuals.plot(kind='kde', label='Validation Residuals Density', ax=axes[0, 1])
axes[0, 1].set_title('Validation Residuals Histogram and Density')
axes[0, 1].legend()
axes[0, 1].grid(True, alpha=0.3)

stats.probplot(val_residuals, dist="norm", plot=axes[1, 0])
#axes[1, 0].set_title('Validation Residuals Q-Q Plot')
axes[1, 0].set_title(f'Validation Residuals Q-Q Plot\nKS-Normality: {"Normal" if val_ks_pvalue > 0.05 else "Not Normal"} ({val_ks_pvalue:.4f})')

plot_acf(val_residuals_cleaned, lags=20, ax=axes[1, 1], title='ACF of Validation Residuals', color='blue')

plt.tight_layout()
plt.show()

# 7. Residual Analysis: Test (2x2 Grid)
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

axes[0, 0].plot(test_residuals.index, test_residuals, label='Validation Residuals', color='red')
axes[0, 0].axhline(0, color='black', linestyle='--')
#axes[0, 0].set_title('Validation Residuals vs Time')
axes[0, 0].set_title(f'Test Residuals vs Time\nRandomness: {"Random" if test_runs_pvalue > 0.05 else "Not Random"} ({test_runs_pvalue:.4f})')
axes[0, 0].set_xlabel('Data_Hora')
axes[0, 0].set_ylabel('Residuals')
axes[0, 0].grid(True, alpha=0.3)

test_residuals.plot(kind='hist', bins=20, density=True, alpha=0.5, label='Validation Residuals Histogram', ax=axes[0, 1])
if test_residuals.notna().sum() > 1:  # Ensure there are at least two valid residuals
    test_residuals.plot(kind='kde', label='Validation Residuals Density', ax=axes[0, 1])
    axes[0, 1].set_title('Validation Residuals Histogram and Density')
    axes[0, 1].legend()
else:
    print("Not enough data points for KDE plot.")

# test_residuals.plot(kind='kde', label='Validation Residuals Density', ax=axes[0, 1])
axes[0, 1].set_title('Validation Residuals Histogram and Density')
axes[0, 1].legend()
axes[0, 1].grid(True, alpha=0.3)

stats.probplot(test_residuals, dist="norm", plot=axes[1, 0])
#axes[1, 0].set_title('Validation Residuals Q-Q Plot')
axes[1, 0].set_title(f'Test Residuals Q-Q Plot\nKS-Normality: {"Normal" if test_ks_pvalue > 0.05 else "Not Normal"} ({test_ks_pvalue:.4f})')

plot_acf(test_residuals_cleaned, lags=20, ax=axes[1, 1], title='ACF of Validation Residuals', color='blue')

plt.tight_layout()
plt.show()

# Add the model: ARIMA
tempSeries = pd.Series(val_residuals)
ResTableVal.loc[8] = {"ModelName": "ARIMA", "ResValues": tempSeries.values}
tempSeries = pd.Series(test_residuals)
ResTableTest.loc[8] = {"ModelName": "ARIMA", "ResValues": tempSeries.values}
Metrics Table:
MAPE MAE RMSE MSE Ljung-Box p-value Randomness p-value KS-Normality p-value
Validation 6.6364 1.4075 1.7045 2.9054 0.0 0.0004 0.1055
Test 12.8064 2.9088 3.8886 15.1211 0.0 0.0001 0.0000
Test Interpretation Table:
Randomness KS-Normality
Validation Not Random Normal
Test Not Random Not Normal

16. Modelo Autorregresivo Integrado Media Móvil - Rolling (ARIMA-Rolling)¶

El código implementa un modelo ARIMA-Rolling para predecir una serie de tiempo con datos meteorológicos obtenidos en la estación A601 del sudoeste de Brasil, dividiendo los datos en conjuntos de entrenamiento, validación y prueba.

Preparación de datos:

    La serie temporal se divide en tres conjuntos: entrenamiento, validación y prueba.

    Se ajusta la frecuencia de la serie temporal a horas ('H').

Modelado ARIMA:

    Se define y ajusta un modelo ARIMA con parámetros (p, d, q) = (1, 1, 1) a los datos de entrenamiento.

    Se realizan pronósticos para los conjuntos de validación y prueba.

Evaluación del modelo:

    Se calculan métricas de error como MAPE, MAE, RMSE y MSE para evaluar la precisión de los pronósticos.

    Se realizan pruebas estadísticas (Ljung-Box, Wald-Wolfowitz y Kolmogorov-Smirnov) para analizar los residuos y verificar su aleatoriedad y normalidad.

Visualización:

    Se grafican los datos de entrenamiento, validación, prueba y los pronósticos.

    Se analizan y visualizan los residuos mediante histogramas, gráficos de densidad, Q-Q plots y ACF (Autocorrelación).

Resultados:

    Se presentan tablas con las métricas de evaluación y la interpretación de las pruebas estadísticas.

En resumen, el código implementa un modelo ARIMA-Rolling para predecir la temperatura en las próximas 24 horas basados en la serie de tiempo dada, evalúa su rendimiento mediante métricas y pruebas estadísticas, y visualiza los resultados y residuos para su análisis.

Los resultados de las métricas y la interpretación de los resultados de normalidad y aleatoriedad se presentan en tablas independientes.

In [67]:
# JDF-2025Mar09 - ARIMA-Rolling
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats
from statsmodels.stats.diagnostic import acorr_ljungbox
from statsmodels.tsa.arima.model import ARIMA
from statsmodels.graphics.tsaplots import plot_acf
from sklearn.metrics import mean_absolute_error, mean_squared_error

# 0. Set the original data in data_ts
data_ts = data_ori

# 1. Split the series into training, validation, and test sets
test_size = 24
val_size = 24

# Set the frequency of the datetime index
data_ts = data_ts.asfreq('H')

# Test: Last 24 values
test = data_ts.iloc[-test_size:]

# Validation: 24 values before test
validation = data_ts.iloc[-(test_size + val_size):-test_size]

# Training: All other records
train = data_ts.iloc[:-(test_size + val_size)]

# 2. Build and fit the ARIMA model
def fit_arima_model(data, order):
    """
    Fit an ARIMA model to the data.

    Parameters:
        data (pd.Series): Input time series.
        order (tuple): (p, d, q) parameters of the ARIMA model.

    Returns:
        model: Fitted ARIMA model.
    """
    model = ARIMA(data, order=order)
    return model.fit()

# 3. Rolling Forecast
def rolling_forecast_arima(train, test, order):
    """
    Perform rolling forecasts using the ARIMA model.

    Parameters:
        train (pd.Series): Training data.
        test (pd.Series): Test data.
        order (tuple): (p, d, q) parameters of the ARIMA model.

    Returns:
        forecast: Forecasted values as a Pandas Series.
    """
    history = train.copy()
    forecast = []

    for i in range(len(test)):
        # Fit the ARIMA model
        model = fit_arima_model(history, order)
        # Forecast one step ahead
        yhat = model.forecast(steps=1)
        forecast.append(yhat.iloc[0])
        # Append the actual test value to history
        history = pd.concat([history, test.iloc[i:i+1]])

    return pd.Series(forecast, index=test.index)

# ARIMA order
arima_order = (1, 1, 1)  # Adjust the (p, d, q) order as needed

# Rolling forecast for validation
val_forecast = rolling_forecast_arima(train, validation, arima_order)

# Rolling forecast for test
combined_train = pd.concat([train, validation])
test_forecast = rolling_forecast_arima(combined_train, test, arima_order)

# 4. Calculate metrics
def calculate_metrics(actual, forecast):
    """
    Calculate MAPE, MAE, RMSE, MSE, and Ljung-Box p-value.
    """
    actual = actual.replace(0, 1e-9)  # Avoid division by zero
    metrics = {
        'MAPE': np.mean(np.abs((actual - forecast) / actual)) * 100,
        'MAE': mean_absolute_error(actual, forecast),
        'RMSE': np.sqrt(mean_squared_error(actual, forecast)),
        'MSE': mean_squared_error(actual, forecast),
    }
    return metrics

# Validation metrics
val_metrics = calculate_metrics(validation, val_forecast)
val_residuals = validation - val_forecast

# Drop NaN values from residuals before Ljung-Box test
val_residuals_cleaned = val_residuals.dropna()
if len(val_residuals_cleaned) > 0:
    val_metrics['Ljung-Box p-value'] = acorr_ljungbox(val_residuals_cleaned, lags=10, return_df=True)['lb_pvalue'].values[0]
else:
    val_metrics['Ljung-Box p-value'] = np.nan  # Handle case where all residuals are NaN

# Test metrics
test_metrics = calculate_metrics(test, test_forecast)
test_residuals = test - test_forecast

# Drop NaN values from residuals before Ljung-Box test
test_residuals_cleaned = test_residuals.dropna()
if len(test_residuals_cleaned) > 0:
    test_metrics['Ljung-Box p-value'] = acorr_ljungbox(test_residuals_cleaned, lags=10, return_df=True)['lb_pvalue'].values[0]
else:
    test_metrics['Ljung-Box p-value'] = np.nan  # Handle case where all residuals are NaN

# Perform Wald-Wolfowitz Randomness Test
val_runs_stat, val_runs_pvalue = runstest_1samp(val_residuals)
test_runs_stat, test_runs_pvalue = runstest_1samp(test_residuals)

# Add randomness test results to metrics
val_metrics['Randomness p-value'] = val_runs_pvalue
test_metrics['Randomness p-value'] = test_runs_pvalue

# Perform Kolmogorov-Smirnov Normality Test
val_ks_stat, val_ks_pvalue = stats.kstest(val_residuals, 'norm')
test_ks_stat, test_ks_pvalue = stats.kstest(test_residuals, 'norm')

# Add normality test results to metrics
val_metrics['KS-Normality p-value'] = val_ks_pvalue
test_metrics['KS-Normality p-value'] = test_ks_pvalue

# Create a DataFrame for metrics
metrics_table = pd.DataFrame([val_metrics, test_metrics], index=['Validation', 'Test'])

# Create a separate table for interpretation
interpretation_table = pd.DataFrame({
    'Randomness': ['Random' if val_runs_pvalue > 0.05 else 'Not Random', 'Random' if test_runs_pvalue > 0.05 else 'Not Random'],
    'KS-Normality': ['Normal' if val_ks_pvalue > 0.05 else 'Not Normal', 'Normal' if test_ks_pvalue > 0.05 else 'Not Normal']
}, index=['Validation', 'Test'])

# Print the tables
print("Metrics Table:")
display(metrics_table.round(4))  # Use display() to render the DataFrame

print("\nTest Interpretation Table:")
display(interpretation_table.round(4))  # Use display() to render the DataFrame

# 5. Plot Training (last 48 observations), Validation, Test, and Forecasts
plt.figure(figsize=(14, 6))
plt.plot(train.iloc[-48:].index, train.iloc[-48:], label='Training (Last 48)', color='blue')
plt.plot(validation.index, validation, label='Validation', color='green')
plt.plot(test.index, test, label='Test', color='red')
plt.plot(validation.index, val_forecast, label='Validation Forecast', color='orange', linestyle='--')
plt.plot(test.index, test_forecast, label='Test Forecast', color='purple', linestyle='--')
plt.title(f'ARIMA-Rolling Forecast\nValidation MAE: {val_metrics["MAE"]:.2f}, Test MAE: {test_metrics["MAE"]:.2f}')
plt.xlabel('Data_Hora')
plt.ylabel('Temperature (°C)')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

# 6. Residual Analysis: Validation
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

axes[0, 0].plot(val_residuals.index, val_residuals, label='Validation Residuals', color='blue')
axes[0, 0].axhline(0, color='black', linestyle='--')
#axes[0, 0].set_title('Validation Residuals vs Time')
axes[0, 0].set_title(f'Validation Residuals vs Time\nRandomness: {"Random" if val_runs_pvalue > 0.05 else "Not Random"} ({val_runs_pvalue:.4f})')
axes[0, 0].set_xlabel('Data_Hora')
axes[0, 0].set_ylabel('Residuals')
axes[0, 0].grid(True, alpha=0.3)

val_residuals.plot(kind='hist', bins=20, density=True, alpha=0.5, label='Validation Residuals Histogram', ax=axes[0, 1])
if val_residuals.notna().sum() > 1:  # Ensure there are at least two valid residuals
    val_residuals.plot(kind='kde', label='Validation Residuals Density', ax=axes[0, 1])
    axes[0, 1].set_title('Validation Residuals Histogram and Density')
    axes[0, 1].legend()
else:
    print("Not enough data points for KDE plot.")

stats.probplot(val_residuals, dist="norm", plot=axes[1, 0])
#axes[1, 0].set_title('Validation Residuals Q-Q Plot')
axes[1, 0].set_title(f'Validation Residuals Q-Q Plot\nKS-Normality: {"Normal" if val_ks_pvalue > 0.05 else "Not Normal"} ({val_ks_pvalue:.4f})')

plot_acf(val_residuals_cleaned, lags=20, ax=axes[1, 1], title='ACF of Validation Residuals', color='blue')

plt.tight_layout()
plt.show()

# 7. Residual Analysis: Test (2x2 Grid)
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

axes[0, 0].plot(test_residuals.index, test_residuals, label='Validation Residuals', color='red')
axes[0, 0].axhline(0, color='black', linestyle='--')
# axes[0, 0].set_title('Validation Residuals vs Time')
axes[0, 0].set_title(f'Test Residuals vs Time\nRandomness: {"Random" if test_runs_pvalue > 0.05 else "Not Random"} ({test_runs_pvalue:.4f})')
axes[0, 0].set_xlabel('Data_Hora')
axes[0, 0].set_ylabel('Residuals')
axes[0, 0].grid(True, alpha=0.3)

test_residuals.plot(kind='hist', bins=20, density=True, alpha=0.5, label='Validation Residuals Histogram', ax=axes[0, 1])
if test_residuals.notna().sum() > 1:  # Ensure there are at least two valid residuals
    test_residuals.plot(kind='kde', label='Validation Residuals Density', ax=axes[0, 1])
    axes[0, 1].set_title('Validation Residuals Histogram and Density')
    axes[0, 1].legend()
else:
    print("Not enough data points for KDE plot.")

# test_residuals.plot(kind='kde', label='Validation Residuals Density', ax=axes[0, 1])
axes[0, 1].set_title('Validation Residuals Histogram and Density')
axes[0, 1].legend()
axes[0, 1].grid(True, alpha=0.3)

stats.probplot(test_residuals, dist="norm", plot=axes[1, 0])
#axes[1, 0].set_title('Validation Residuals Q-Q Plot')
axes[1, 0].set_title(f'Test Residuals Q-Q Plot\nKS-Normality: {"Normal" if test_ks_pvalue > 0.05 else "Not Normal"} ({test_ks_pvalue:.4f})')

plot_acf(test_residuals_cleaned, lags=20, ax=axes[1, 1], title='ACF of Validation Residuals', color='blue')

plt.tight_layout()
plt.show()

# Add the model: ARIMA-Rolling
tempSeries = pd.Series(val_residuals)
ResTableVal.loc[9] = {"ModelName": "ARIMA-Rolling", "ResValues": tempSeries.values}
tempSeries = pd.Series(test_residuals)
ResTableTest.loc[9] = {"ModelName": "ARIMA-Rolling", "ResValues": tempSeries.values}
Metrics Table:
MAPE MAE RMSE MSE Ljung-Box p-value Randomness p-value KS-Normality p-value
Validation 3.0830 0.6450 0.7942 0.6308 0.7384 0.8062 0.7772
Test 2.9237 0.5964 0.7443 0.5540 0.1419 0.9429 0.5619
Test Interpretation Table:
Randomness KS-Normality
Validation Random Normal
Test Random Normal

16. Modelo Support Vector Regression - SVR¶

El código implementa un modelo de Support Vector Regression (SVR) para predecir la temperatura de las siguientes 24 horas con base en los datos de la serie de tiempo de obtenidos en la estación A601 del sudoeste de Brasil. Las librearías utilizadas son: statsmodels y sklearn.

A continuación, se resume lo más relevante:

Preparación de datos:

    La serie de tiempo se divide en conjuntos de entrenamiento, validación y prueba.

    Se rellenan los valores faltantes (NaN) con el método ffill.

    Los datos se escalan utilizando MinMaxScaler para normalizarlos en el rango [0, 1].

Construcción del modelo SVR:

    Se utiliza GridSearchCV para encontrar los mejores hiperparámetros del modelo SVR (C, epsilon, kernel).

    Se entrena el modelo SVR con los datos de entrenamiento.

Pronóstico:

    Se realizan pronósticos para los conjuntos de validación y prueba utilizando el modelo SVR entrenado.

    Los pronósticos se transforman de nuevo a la escala original.

Evaluación del modelo:

    Se calculan métricas de error como MAPE, MAE, RMSE y MSE.

    Se realizan pruebas estadísticas (Ljung-Box, Wald-Wolfowitz y Kolmogorov-Smirnov) para analizar los residuos y verificar su aleatoriedad y normalidad.

Visualización:

    Se grafican los datos de entrenamiento, validación, prueba y los pronósticos.

    Se analizan y visualizan los residuos mediante histogramas, gráficos de densidad, Q-Q plots y ACF (Autocorrelación).

Resultados:

    Se presentan tablas con las métricas de evaluación y la interpretación de las pruebas estadísticas.

En resumen, el código implementa un modelo SVR para predecir una serie temporal, evalúa su rendimiento mediante métricas y pruebas estadísticas, y visualiza los resultados y residuos para su análisis.

Los resultados de las métricas y la interpretación de los resultados de normalidad y aleatoriedad se presentan en tablas independientes.

In [68]:
# JDF-2025Mar09 - SVR (without AFS)

# Import necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats
from statsmodels.stats.diagnostic import acorr_ljungbox
from statsmodels.graphics.tsaplots import plot_acf  # Add this line
from sklearn.metrics import mean_absolute_error, mean_squared_error
from sklearn.preprocessing import MinMaxScaler
from sklearn.svm import SVR
from sklearn.model_selection import GridSearchCV
from statsmodels.sandbox.stats.runs import runstest_1samp


# 0. Fix NaN values and prepare the dataset
data_ts = data_ori.fillna(method='ffill')[-1000:]

# 1. Split the series into training, validation, and test sets
test_size = 24
val_size = 24

# Test: Last 24 values
test = data_ts.iloc[-test_size-6:]

# Validation: 24 values before test
validation = data_ts.iloc[-(test_size + val_size)-6:-test_size]

# Training: All other records
train = data_ts.iloc[:-(test_size + val_size)]

# 2. Prepare data for the SVR model
def create_dataset(data, look_back=6):
    """
    Create sequences of data for SVR input.
    """
    X, y = [], []
    for i in range(len(data) - look_back):
        X.append(data.iloc[i:i + look_back].values)
        y.append(data.iloc[i + look_back])
    return np.array(X), np.array(y)

# Scale the data
scaler = MinMaxScaler(feature_range=(0, 1))
train_scaled = scaler.fit_transform(train.values.reshape(-1, 1))
validation_scaled = scaler.transform(validation.values.reshape(-1, 1))
test_scaled = scaler.transform(test.values.reshape(-1, 1))

# Create datasets
look_back = 6
X_train, y_train = create_dataset(pd.Series(train_scaled.flatten()), look_back)
X_val, y_val = create_dataset(pd.Series(validation_scaled.flatten()), look_back)
X_test, y_test = create_dataset(pd.Series(test_scaled.flatten()), look_back)

# 3. Build and train the SVR model
def build_svr_model(X_train, y_train):
    param_grid = {
        'C': [0.1, 1, 10],
        'epsilon': [0.01, 0.1, 0.2],
        'kernel': ['rbf', 'linear']
    }
    model = GridSearchCV(SVR(), param_grid, cv=3, scoring='neg_mean_squared_error')
    model.fit(X_train, y_train)
    print("Best SVR parameters:", model.best_params_)
    return model.best_estimator_

svr_model = build_svr_model(X_train, y_train)

# Forecast for validation and test
def forecast_svr(model, data, look_back):
    forecast = []
    for i in range(len(data) - look_back):
        X = data[i:i + look_back].reshape(1, look_back)
        y_pred = model.predict(X)
        forecast.append(y_pred[0])
    return np.array(forecast)

# Forecast for validation
val_forecast_scaled = forecast_svr(svr_model, validation_scaled, look_back)
val_forecast = scaler.inverse_transform(val_forecast_scaled.reshape(-1, 1)).flatten()
val_residuals = validation.iloc[look_back:] - val_forecast

# Forecast for test
test_forecast_scaled = forecast_svr(svr_model, test_scaled, look_back)
test_forecast = scaler.inverse_transform(test_forecast_scaled.reshape(-1, 1)).flatten()
test_residuals = test.iloc[look_back:] - test_forecast

# 4. Calculate metrics
def calculate_metrics(actual, forecast, residuals):
    """
    Calculate MAPE, MAE, RMSE, MSE, and Ljung-Box p-value.
    """
    actual = actual.replace(0, 1e-9)  # Avoid division by zero
    metrics = {
        'MAPE': np.mean(np.abs((actual - forecast) / actual)) * 100,
        'MAE': mean_absolute_error(actual, forecast),
        'RMSE': np.sqrt(mean_squared_error(actual, forecast)),
        'MSE': mean_squared_error(actual, forecast),
    }

    # Calculate Ljung-Box p-value
    residuals_cleaned = residuals.dropna()
    if len(residuals_cleaned) > 0:
        ljungbox_result = acorr_ljungbox(residuals_cleaned, lags=10, return_df=True)
        metrics['Ljung-Box p-value'] = ljungbox_result['lb_pvalue'].values[0]
    else:
        metrics['Ljung-Box p-value'] = np.nan  # Handle case where all residuals are NaN

    return metrics

# Validation metrics
val_metrics = calculate_metrics(validation.iloc[look_back:], val_forecast, val_residuals)

# Test metrics
test_metrics = calculate_metrics(test.iloc[look_back:], test_forecast, test_residuals)

# Perform Wald-Wolfowitz Randomness Test
val_runs_stat, val_runs_pvalue = runstest_1samp(val_residuals)
test_runs_stat, test_runs_pvalue = runstest_1samp(test_residuals)

# Add randomness test results to metrics
val_metrics['Randomness p-value'] = val_runs_pvalue
test_metrics['Randomness p-value'] = test_runs_pvalue

# Perform Kolmogorov-Smirnov Normality Test
val_ks_stat, val_ks_pvalue = stats.kstest(val_residuals, 'norm')
test_ks_stat, test_ks_pvalue = stats.kstest(test_residuals, 'norm')

# Add normality test results to metrics
val_metrics['KS-Normality p-value'] = val_ks_pvalue
test_metrics['KS-Normality p-value'] = test_ks_pvalue

# Create a DataFrame for metrics
metrics_table = pd.DataFrame([val_metrics, test_metrics], index=['Validation', 'Test'])

# Create a separate table for interpretation
interpretation_table = pd.DataFrame({
    'Randomness': ['Random' if val_runs_pvalue > 0.05 else 'Not Random', 'Random' if test_runs_pvalue > 0.05 else 'Not Random'],
    'KS-Normality': ['Normal' if val_ks_pvalue > 0.05 else 'Not Normal', 'Normal' if test_ks_pvalue > 0.05 else 'Not Normal']
}, index=['Validation', 'Test'])

# Print the tables
print("Metrics Table:")
display(metrics_table.round(4))  # Use display() to render the DataFrame

print("\nTest Interpretation Table:")
display(interpretation_table.round(4))  # Use display() to render the DataFrame

# 5. Plot final forecasts
plt.figure(figsize=(14, 6))
plt.plot(train.iloc[-48:].index, train.iloc[-48:], label='Training (Last 48)', color='blue')
plt.plot(validation.index, validation, label='Validation', color='green')
plt.plot(test.index, test, label='Test', color='red')
plt.plot(validation.index[look_back:], val_forecast, label='Validation SVR Forecast', color='orange', linestyle='--')
plt.plot(test.index[look_back:], test_forecast, label='Test SVR Forecast', color='purple', linestyle='--')

plt.title('SVR Forecast')
plt.xlabel('Data_Hora')
plt.ylabel('Temperature (°C)')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

# 6. Residual Analysis: Validation (2x2 Grid)
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# Quadrant 0,0: Residuals vs Time (Validation)
axes[0, 0].plot(val_residuals.index, val_residuals, label='Validation Residuals', color='blue')
axes[0, 0].axhline(0, color='black', linestyle='--')
axes[0, 0].set_title(f'Validation Residuals vs Time\nRandomness: {"Random" if val_runs_pvalue > 0.05 else "Not Random"} ({val_runs_pvalue:.4f})')
axes[0, 0].set_xlabel('Data_Hora')
axes[0, 0].set_ylabel('Residuals')
axes[0, 0].legend()
axes[0, 0].grid(True, alpha=0.3)

# Quadrant 0,1: Residuals Density and Histogram (Validation)
val_residuals.plot(kind='hist', bins=20, density=True, alpha=0.5, label='Validation Residuals Histogram', color='blue', ax=axes[0, 1])
val_residuals.plot(kind='kde', label='Validation Residuals Density', color='blue', linestyle='--', ax=axes[0, 1])
axes[0, 1].set_title('Validation Residuals Histogram and Density')
axes[0, 1].set_xlabel('Residuals')
axes[0, 1].set_ylabel('Density')
axes[0, 1].legend()
axes[0, 1].grid(True, alpha=0.3)

# Quadrant 1,0: Q-Q Plot (Validation)
stats.probplot(val_residuals, dist="norm", plot=axes[1, 0])
axes[1, 0].set_title(f'Validation Residuals Q-Q Plot\nKS-Normality: {"Normal" if val_ks_pvalue > 0.05 else "Not Normal"} ({val_ks_pvalue:.4f})')
axes[1, 0].grid(True, alpha=0.3)

# Quadrant 1,1: ACF (Validation)
plot_acf(val_residuals, lags=min(20, len(val_residuals) - 1), ax=axes[1, 1], title='ACF of Validation Residuals', color='blue')
axes[1, 1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# 7. Residual Analysis: Test (2x2 Grid)
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# Quadrant 0,0: Residuals vs Time (Test)
axes[0, 0].plot(test_residuals.index, test_residuals, label='Test Residuals', color='red')
axes[0, 0].axhline(0, color='black', linestyle='--')
axes[0, 0].set_title(f'Test Residuals vs Time\nRandomness: {"Random" if test_runs_pvalue > 0.05 else "Not Random"} ({test_runs_pvalue:.4f})')
axes[0, 0].set_xlabel('Data_Hora')
axes[0, 0].set_ylabel('Residuals')
axes[0, 0].legend()
axes[0, 0].grid(True, alpha=0.3)

# Quadrant 0,1: Residuals Density and Histogram (Test)
test_residuals.plot(kind='hist', bins=20, density=True, alpha=0.5, label='Test Residuals Histogram', color='red', ax=axes[0, 1])
test_residuals.plot(kind='kde', label='Test Residuals Density', color='red', linestyle='--', ax=axes[0, 1])
axes[0, 1].set_title('Test Residuals Histogram and Density')
axes[0, 1].set_xlabel('Residuals')
axes[0, 1].set_ylabel('Density')
axes[0, 1].legend()
axes[0, 1].grid(True, alpha=0.3)

# Quadrant 1,0: Q-Q Plot (Test)
stats.probplot(test_residuals, dist="norm", plot=axes[1, 0])
axes[1, 0].set_title(f'Test Residuals Q-Q Plot\nKS-Normality: {"Normal" if test_ks_pvalue > 0.05 else "Not Normal"} ({test_ks_pvalue:.4f})')
axes[1, 0].grid(True, alpha=0.3)

# Quadrant 1,1: ACF (Test)
plot_acf(test_residuals, lags=min(20, len(test_residuals) - 1), ax=axes[1, 1], title='ACF of Test Residuals', color='red')
axes[1, 1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Add the model: SVR
tempSeries = pd.Series(val_residuals)
ResTableVal.loc[10] = {"ModelName": "SVR", "ResValues": tempSeries.values}
tempSeries = pd.Series(test_residuals)
ResTableTest.loc[10] = {"ModelName": "SVR", "ResValues": tempSeries.values}
Best SVR parameters: {'C': 10, 'epsilon': 0.01, 'kernel': 'linear'}
Metrics Table:
MAPE MAE RMSE MSE Ljung-Box p-value Randomness p-value KS-Normality p-value
Validation 3.2721 0.6778 0.8168 0.6672 0.6946 0.8062 0.1562
Test 3.1501 0.6364 0.7637 0.5832 0.2419 0.8609 0.0975
Test Interpretation Table:
Randomness KS-Normality
Validation Random Normal
Test Random Normal

17. Modelo Propuesto Support Vector Regression - Adaptative Fourier Series (SVR-AFS)¶

A continuación el código Python para generar el modelo SVR-AFS

El código implementa un modelo SVR (Support Vector Regression) mejorado con Adaptive Fourier Series (AFS) para predecir la temperatura en las siguientes 24 horas a partir de la serie de tiempo que contiene los datos meteorológicos de la estación A601 del sudoeste de Brasil. A continuación, se resume lo más relevante:

Preparación de datos:

    La serie temporal se divide en conjuntos de entrenamiento, validación y prueba.

    Se rellenan los valores faltantes (NaN) con el método ffill.

    Los datos se escalan utilizando MinMaxScaler para normalizarlos en el rango [0, 1].

Modelado SVR inicial:

    Se entrena un modelo SVR utilizando GridSearchCV para encontrar los mejores hiperparámetros (C, epsilon, kernel).

    Se realizan pronósticos para los conjuntos de validación y prueba.

Mejora con Adaptive Fourier Series (AFS):

    Se calculan los residuos del modelo SVR inicial.

    Se generan características de Fourier (componentes seno y coseno) a partir de los residuos.

    Se entrena un segundo modelo SVR sobre estas características para corregir los residuos.

Pronóstico corregido:

    Se ajustan los pronósticos iniciales utilizando las correcciones de residuos obtenidas del segundo modelo SVR.

Evaluación del modelo:

    Se calculan métricas de error como MAPE, MAE, RMSE y MSE.

    Se realizan pruebas estadísticas (Ljung-Box, Wald-Wolfowitz y Kolmogorov-Smirnov) para analizar los residuos y verificar su aleatoriedad y normalidad.

Visualización:

    Se grafican los datos de entrenamiento, validación, prueba y los pronósticos corregidos.

    Se analizan y visualizan los residuos mediante histogramas, gráficos de densidad, Q-Q plots y ACF (Autocorrelación).

Resultados:

    Se presentan tablas con las métricas de evaluación y la interpretación de las pruebas estadísticas.

En resumen, el código implementa un modelo SVR mejorado con AFS para predecir una serie temporal, corrige los residuos utilizando características de Fourier, evalúa su rendimiento mediante métricas y pruebas estadísticas, y visualiza los resultados y residuos para su análisis.

Los resultados de las métricas y la interpretación de los resultados de normalidad y aleatoriedad se presentan en tablas independientes.

Los resultados de las métricas y la interpretación de los resultados de normalidad y aleatoriedad se presentan en tablas independientes.

In [69]:
# JDF-2025Mar10- SVR-AFS

# Import necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats
from statsmodels.stats.diagnostic import acorr_ljungbox
from sklearn.metrics import mean_absolute_error, mean_squared_error
from sklearn.preprocessing import MinMaxScaler
from sklearn.svm import SVR
from sklearn.model_selection import GridSearchCV
from statsmodels.sandbox.stats.runs import runstest_1samp

# Adaptive Fourier Series (AFS) for Residuals
def adaptive_fourier_series(data, n_components=3):
    """
    Generate Adaptive Fourier Series (AFS) features for a given series.

    Parameters:
        data (pd.Series): Input time series (residuals).
        n_components (int): Number of Fourier components to use.

    Returns:
        pd.DataFrame: DataFrame with AFS features (cosine and sine components).
    """
    t = np.linspace(0, 1, len(data), endpoint=False)
    afs_features = pd.DataFrame(index=data.index)

    for n in range(1, n_components + 1):
        afs_features[f'cos_{n}'] = np.cos(2 * np.pi * n * t * data.values)
        afs_features[f'sin_{n}'] = np.sin(2 * np.pi * n * t * data.values)

    return afs_features

# 0. Fix NaN values and prepare the dataset
data_ts = data_ori.fillna(method='ffill')[-1000:]

# 1. Split the series into training, validation, and test sets
test_size = 24
val_size = 24

test = data_ts.iloc[-test_size-6:]
validation = data_ts.iloc[-(test_size + val_size)-6:-test_size]
train = data_ts.iloc[:-(test_size + val_size)]
train2 = train = data_ts.iloc[:-test_size]

# 2. Prepare data for the initial SVR model
def create_dataset(data, look_back=6):
    """
    Create sequences of data for SVR input.
    """
    X, y = [], []
    for i in range(len(data) - look_back):
        X.append(data.iloc[i:i + look_back].values)
        y.append(data.iloc[i + look_back])
    return np.array(X), np.array(y)

scaler = MinMaxScaler(feature_range=(0, 1))
train_scaled = scaler.fit_transform(train.values.reshape(-1, 1))
validation_scaled = scaler.transform(validation.values.reshape(-1, 1))
test_scaled = scaler.transform(test.values.reshape(-1, 1))

look_back = 6
X_train, y_train = create_dataset(pd.Series(train_scaled.flatten()), look_back)
X_val, y_val = create_dataset(pd.Series(validation_scaled.flatten()), look_back)
X_test, y_test = create_dataset(pd.Series(test_scaled.flatten()), look_back)

# 3. Build and train the initial SVR model
def build_svr_model(X_train, y_train):
    param_grid = {
        'C': [0.1, 1, 10],
        'epsilon': [0.01, 0.1, 0.2],
        'kernel': ['rbf', 'linear']
    }
    model = GridSearchCV(SVR(), param_grid, cv=3, scoring='neg_mean_squared_error')
    model.fit(X_train, y_train)
    print("Best SVR parameters:", model.best_params_)
    return model.best_estimator_

svr_model = build_svr_model(X_train, y_train)

# Forecast for validation and test
def forecast_svr(model, data, look_back):
    forecast = []
    for i in range(len(data) - look_back):
        X = data[i:i + look_back].reshape(1, look_back)
        y_pred = model.predict(X)
        forecast.append(y_pred[0])
    return np.array(forecast)

val_forecast_scaled = forecast_svr(svr_model, validation_scaled, look_back)
val_forecast = scaler.inverse_transform(val_forecast_scaled.reshape(-1, 1)).flatten()
val_residuals = validation.iloc[look_back:] - val_forecast

test_forecast_scaled = forecast_svr(svr_model, test_scaled, look_back)
test_forecast = scaler.inverse_transform(test_forecast_scaled.reshape(-1, 1)).flatten()
test_residuals = test.iloc[look_back:] - test_forecast

# 4. Add AFS features for residuals and train a second SVR-AFS model
n_components = 3
val_residuals_features = adaptive_fourier_series(val_residuals, n_components)
test_residuals_features = adaptive_fourier_series(test_residuals, n_components)

# Train SVR on residuals
X_residuals = val_residuals_features.values
y_residuals = val_residuals.values

svr_residual_model = SVR(kernel='rbf', C=1.0, epsilon=0.1)
svr_residual_model.fit(X_residuals, y_residuals)

# Correct the forecasts using the residual SVR model
val_residual_corrections = svr_residual_model.predict(val_residuals_features)
val_corrected_forecast = val_forecast + val_residual_corrections

test_residual_corrections = svr_residual_model.predict(test_residuals_features)
test_corrected_forecast = test_forecast + test_residual_corrections

# JDF  - 2025Mar09 Empieza prueba
val_residuals_def = validation.iloc[look_back:] - val_corrected_forecast
val_residuals_def = validation.iloc[look_back:] - val_corrected_forecast
#test_residuals_def = validation.iloc[look_back:] - test_corrected_forecast
test_residuals_def = test.iloc[look_back:] - test_corrected_forecast

# 5. Calculate metrics
def calculate_metrics(actual, forecast, residuals):
    """
    Calculate MAPE, MAE, RMSE, MSE, and Ljung-Box p-value.
    """
    actual = actual.replace(0, 1e-9)  # Avoid division by zero
    metrics = {
        'MAPE': np.mean(np.abs((actual - forecast) / actual)) * 100,
        'MAE': mean_absolute_error(actual, forecast),
        'RMSE': np.sqrt(mean_squared_error(actual, forecast)),
        'MSE': mean_squared_error(actual, forecast),
    }

    # Calculate Ljung-Box p-value
    residuals_cleaned = residuals.dropna()
    if len(residuals_cleaned) > 0:
        ljungbox_result = acorr_ljungbox(residuals_cleaned, lags=10, return_df=True)
        metrics['Ljung-Box p-value'] = ljungbox_result['lb_pvalue'].values[0]
    else:
        metrics['Ljung-Box p-value'] = np.nan  # Handle case where all residuals are NaN

    return metrics

# Validation metrics
val_metrics = calculate_metrics(validation.iloc[look_back:], val_corrected_forecast, val_residuals_def)

# Test metrics
test_metrics = calculate_metrics(test.iloc[look_back:], test_corrected_forecast, test_residuals_def)

# Perform Wald-Wolfowitz Randomness Test
val_runs_stat, val_runs_pvalue = runstest_1samp(val_residuals_def)
test_runs_stat, test_runs_pvalue = runstest_1samp(test_residuals_def)

# Add randomness test results to metrics
val_metrics['Randomness p-value'] = val_runs_pvalue
test_metrics['Randomness p-value'] = test_runs_pvalue

# Perform Kolmogorov-Smirnov Normality Test
val_ks_stat, val_ks_pvalue = stats.kstest(val_residuals_def, 'norm')
test_ks_stat, test_ks_pvalue = stats.kstest(test_residuals_def, 'norm')

# Add normality test results to metrics
val_metrics['KS-Normality p-value'] = val_ks_pvalue
test_metrics['KS-Normality p-value'] = test_ks_pvalue

# Create a DataFrame for metrics
metrics_table = pd.DataFrame([val_metrics, test_metrics], index=['Validation', 'Test'])


# Create a separate table for interpretation
interpretation_table = pd.DataFrame({
    'Randomness': ['Random' if val_runs_pvalue > 0.05 else 'Not Random', 'Random' if test_runs_pvalue > 0.05 else 'Not Random'],
    'KS-Normality': ['Normal' if val_ks_pvalue > 0.05 else 'Not Normal', 'Normal' if test_ks_pvalue > 0.05 else 'Not Normal']
}, index=['Validation', 'Test'])

# Print the tables
print("Metrics Table:")
display(metrics_table.round(4))  # Use display() to render the DataFrame

print("\nTest Interpretation Table:")
display(interpretation_table.round(4))  # Use display() to render the DataFrame

# 6. Plot final corrected forecasts
plt.figure(figsize=(14, 6))
plt.plot(train.iloc[-48:].index, train.iloc[-48:], label='Training (Last 48)', color='blue')
plt.plot(validation.index, validation, label='Validation', color='green')
plt.plot(test.index, test, label='Test', color='red')
plt.plot(validation.index[look_back:], val_corrected_forecast, label='Validation SVR-AFS Forecast', color='orange', linestyle='--')
plt.plot(test.index[look_back:], test_corrected_forecast, label='Test SVR-AFS Forecast', color='purple', linestyle='--')

plt.title('SVR-AFS Forecast with Residual Corrections')
plt.xlabel('Data_Hora')
plt.ylabel('Temperature (°C)')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

# 7. Residual Analysis: Validation (2x2 Grid)
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# Quadrant 0,0: Residuals vs Time (Validation)
axes[0, 0].plot(val_residuals.index, val_residuals, label='Validation Residuals', color='blue')
axes[0, 0].axhline(0, color='black', linestyle='--')
#axes[0, 0].set_title('Validation Residuals vs Time')
axes[0, 0].set_title(f'Validation Residuals vs Time\nRandomness: {"Random" if val_runs_pvalue > 0.05 else "Not Random"} ({val_runs_pvalue:.4f})')
axes[0, 0].set_xlabel('Data_Hora')
axes[0, 0].set_ylabel('Residuals')
axes[0, 0].legend()
axes[0, 0].grid(True, alpha=0.3)

# Quadrant 0,1: Residuals Density and Histogram (Validation)
val_residuals.plot(kind='hist', bins=20, density=True, alpha=0.5, label='Validation Residuals Histogram', color='blue', ax=axes[0, 1])
val_residuals.plot(kind='kde', label='Validation Residuals Density', color='blue', linestyle='--', ax=axes[0, 1])
axes[0, 1].set_title('Validation Residuals Histogram and Density')
axes[0, 1].set_xlabel('Residuals')
axes[0, 1].set_ylabel('Density')
axes[0, 1].legend()
axes[0, 1].grid(True, alpha=0.3)

# Quadrant 1,0: Q-Q Plot (Validation)
stats.probplot(val_residuals, dist="norm", plot=axes[1, 0])
#axes[1, 0].set_title('Validation Residuals Q-Q Plot')
axes[1, 0].set_title(f'Validation Residuals Q-Q Plot\nKS-Normality: {"Normal" if val_ks_pvalue > 0.05 else "Not Normal"} ({val_ks_pvalue:.4f})')
axes[1, 0].grid(True, alpha=0.3)

# Quadrant 1,1: ACF (Validation)
#plot_acf(val_residuals_cleaned, lags=min(20, len(val_residuals_cleaned) - 1), ax=axes[1, 1], title='ACF of Validation Residuals', color='blue')
plot_acf(val_residuals_def, lags=min(20, len(val_residuals_def) - 1), ax=axes[1, 1], title='ACF of Validation Residuals', color='blue')
axes[1, 1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# 8. Residual Analysis: Test (2x2 Grid)
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# Quadrant 0,0: Residuals vs Time (Test)
axes[0, 0].plot(test_residuals.index, test_residuals, label='Test Residuals', color='red')
axes[0, 0].axhline(0, color='black', linestyle='--')
#axes[0, 0].set_title('Test Residuals vs Time')
axes[0, 0].set_title(f'Test Residuals vs Time\nRandomness: {"Random" if test_runs_pvalue > 0.05 else "Not Random"} ({test_runs_pvalue:.4f})')
axes[0, 0].set_xlabel('Data_Hora')
axes[0, 0].set_ylabel('Residuals')
axes[0, 0].legend()
axes[0, 0].grid(True, alpha=0.3)

# Quadrant 0,1: Residuals Density and Histogram (Test)
test_residuals.plot(kind='hist', bins=20, density=True, alpha=0.5, label='Test Residuals Histogram', color='red', ax=axes[0, 1])
test_residuals.plot(kind='kde', label='Test Residuals Density', color='red', linestyle='--', ax=axes[0, 1])
axes[0, 1].set_title('Test Residuals Histogram and Density')
axes[0, 1].set_xlabel('Residuals')
axes[0, 1].set_ylabel('Density')
axes[0, 1].legend()
axes[0, 1].grid(True, alpha=0.3)

# Quadrant 1,0: Q-Q Plot (Test)
stats.probplot(test_residuals, dist="norm", plot=axes[1, 0])
#axes[1, 0].set_title('Test Residuals Q-Q Plot')
axes[1, 0].set_title(f'Test Residuals Q-Q Plot\nKS-Normality: {"Normal" if test_ks_pvalue > 0.05 else "Not Normal"} ({test_ks_pvalue:.4f})')
axes[1, 0].grid(True, alpha=0.3)

# Quadrant 1,1: ACF (Test)
#plot_acf(test_residuals_cleaned, lags=min(20, len(test_residuals_cleaned) - 1), ax=axes[1, 1], title='ACF of Test Residuals', color='red')
plot_acf(test_residuals_def, lags=min(20, len(test_residuals_def) - 1), ax=axes[1, 1], title='ACF of Test Residuals', color='red')
axes[1, 1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Add the model: SVR-AFS
tempSeries = pd.Series(val_residuals_def)
ResTableVal.loc[11] = {"ModelName": "SVR-AFS", "ResValues": tempSeries.values}
tempSeries = pd.Series(test_residuals_def)
ResTableTest.loc[11] = {"ModelName": "SVR-AFS", "ResValues": tempSeries.values}
Best SVR parameters: {'C': 10, 'epsilon': 0.01, 'kernel': 'linear'}
Metrics Table:
MAPE MAE RMSE MSE Ljung-Box p-value Randomness p-value KS-Normality p-value
Validation 2.1869 0.4546 0.6653 0.4426 0.4072 0.0577 0.0933
Test 2.5641 0.5322 0.6960 0.4844 0.4042 0.9429 0.1784
Test Interpretation Table:
Randomness KS-Normality
Validation Random Normal
Test Random Normal

El siguiente código implementa la prueba de Diebold-Mariano para comparar la precisión de dos modelos de pronóstico para los residuos obtenidos en los modelos en el conjunto de Validación. A continuación, se resume lo más relevante:

Prueba de Diebold-Mariano:

    La función diebold_mariano_test compara dos modelos de pronóstico utilizando una función de pérdida (error cuadrático o error absoluto).

    Calcula el estadístico DM y el valor p para determinar si hay una diferencia significativa en la precisión de los pronósticos.

Comparación de modelos:

    Se itera sobre todos los pares únicos de modelos almacenados en ResTableVal.

    Para cada par de modelos, se obtienen los residuos (errores de pronóstico) y se realiza la prueba de Diebold-Mariano.

Resultados:

    Los resultados de la prueba (estadístico DM y valor p) se almacenan en un DataFrame (ResDM).

    Finalmente, se imprime el DataFrame con los resultados de todas las comparaciones.

En resumen, el código compara la precisión de diferentes modelos de pronóstico utilizando la prueba de Diebold-Mariano, almacena los resultados en un DataFrame y los imprime para su análisis

In [70]:
# JDF - 2025Mar11 - Diebold-Mariano - Validación

import pandas as pd
import numpy as np
from scipy.stats import norm

def diebold_mariano_test(actual, forecast1, forecast2, loss_function='squared'):
    """
    Performs the Diebold-Mariano test to compare two forecasts.

    Args:
        actual (np.array): Array of actual values.
        forecast1 (np.array): Array of forecast values from model 1.
        forecast2 (np.array): Array of forecast values from model 2.
        loss_function (str): 'squared' for squared error, 'absolute' for absolute error.

    Returns:
        float: DM statistic.
        float: p-value.
    """

    if len(actual) != len(forecast1) or len(actual) != len(forecast2):
        raise ValueError("Arrays must have the same length.")

    if loss_function == 'squared':
        d = (actual - forecast1)**2 - (actual - forecast2)**2
    elif loss_function == 'absolute':
        d = np.abs(actual - forecast1) - np.abs(actual - forecast2)
    else:
        raise ValueError("Invalid loss function. Choose 'squared' or 'absolute'.")

    mean_d = np.mean(d)
    var_d = np.var(d, ddof=1)  # Use ddof=1 for sample variance
    n = len(d)

    if var_d == 0: #handle the case where all loss differentials are identical.
        return 0,1.0 #no difference, p value 1.0

    dm_stat = mean_d / np.sqrt(var_d / n)
    p_value = 2 * (1 - norm.cdf(abs(dm_stat)))

    return dm_stat, p_value

# Initialize an empty dataframe to store DM test results
ResDM = pd.DataFrame(columns=['ModelName1', 'ModelName2', 'DMStatistic', 'DMpValue'])

# Get the list of unique models
models = ResTableVal['ModelName'].unique()

# Create a zero array for actuals (since residuals are used as forecasts)
#actual_values = np.zeros(len(ResTableVal['ResValues'].iloc[0]))
#len(ResTableVal['ResValues'])

# Iterate over all unique pairs of models
for i in range(len(models)):
    for j in range(i + 1, len(models)):
        model1 = models[i]
        model2 = models[j]

        # Get residuals for the two models
        res1 = ResTableVal[ResTableVal['ModelName'] == model1]['ResValues'].iloc[0]
        res2 = ResTableVal[ResTableVal['ModelName'] == model2]['ResValues'].iloc[0]
        actual_values = np.zeros(len(res1))

        # Perform Diebold-Mariano test
        dm_result = diebold_mariano_test(actual_values, res1, res2, loss_function='squared')

        # Extract DM statistic and p-value
        dm_statistic = dm_result[0]
        dm_pvalue = dm_result[1]

        # Append the result to the ResDM dataframe using pd.concat
        new_row = pd.DataFrame({
            'ModelName1': [model1],
            'ModelName2': [model2],
            'DMStatistic': [dm_statistic],
            'DMpValue': [dm_pvalue]
        })
        ResDM = pd.concat([ResDM, new_row], ignore_index=True)

# Print the resulting dataframe
print(ResDM)
       ModelName1 ModelName2  DMStatistic  DMpValue
0             SES        DES    -1.095432  0.273327
1             SES     SES-HW    -3.300037  0.000967
2             SES     DES-HW    -1.659247  0.097066
3             SES     TES-HW    -1.276185  0.201890
4             SES        RNN     1.755112  0.079240
..            ...        ...          ...       ...
61          ARIMA        SVR     2.939122  0.003291
62          ARIMA    SVR-AFS     3.264993  0.001095
63  ARIMA-Rolling        SVR    -0.401083  0.688359
64  ARIMA-Rolling    SVR-AFS     2.148632  0.031664
65            SVR    SVR-AFS     3.385631  0.000710

[66 rows x 4 columns]

El código realiza un ranking de modelos basado en los resultados de la prueba de Diebold-Mariano para los residuos obtenidos en los modelos en el conjunto de Validación. A continuación, se resume lo más relevante:

Evaluación de modelos:

    Utiliza los resultados almacenados en ResDM, que contiene los estadísticos DM y los valores p de las comparaciones entre pares de modelos.

    Asigna puntajes a cada modelo según si supera significativamente a otro en la prueba de Diebold-Mariano.

Criterio de significancia:

    Si el valor p es menor que el nivel de significancia (0.05), se determina que hay una diferencia significativa entre los modelos.

    El modelo con mejor desempeño (según el estadístico DM) recibe un punto adicional en su puntaje.

Ranking de modelos:

    Los modelos se ordenan de mayor a menor según su puntaje acumulado.

    Se imprime el ranking de modelos, indicando su posición y puntaje.

En resumen, el código clasifica los modelos según su desempeño relativo en la prueba de Diebold-Mariano, generando un ranking de mejor a peor modelo basado en su puntaje acumulado.

In [71]:
# JDF - 2025Mar11 - Diebold-Mariano

import pandas as pd

# Initialize a dictionary to store model scores
model_scores = {}

# Significance level
significance_level = 0.05

# Iterate through each row in ResDM
for index, row in ResDM.iterrows():
    model1 = row['ModelName1']
    model2 = row['ModelName2']
    dm_statistic = row['DMStatistic']
    dm_pvalue = row['DMpValue']

    # Initialize scores for models if not already present
    if model1 not in model_scores:
        model_scores[model1] = 0
    if model2 not in model_scores:
        model_scores[model2] = 0

    # Update scores based on DM results
    if dm_pvalue < significance_level:  # Statistically significant difference
        if dm_statistic < 0:  # Model1 is better
            model_scores[model1] += 1
        else:  # Model2 is better
            model_scores[model2] += 1

# Sort models by their scores in descending order
ranked_models = sorted(model_scores.keys(), key=lambda x: model_scores[x], reverse=True)

# Print the ranking
print("Model Ranking (Best to Worst):")
for rank, model in enumerate(ranked_models, start=1):
    print(f"{rank}. {model} (Score: {model_scores[model]})")
Model Ranking (Best to Worst):
1. SVR-AFS (Score: 10)
2. RNN (Score: 5)
3. LSTM (Score: 5)
4. ARIMA-Rolling (Score: 5)
5. MLP (Score: 4)
6. SVR (Score: 4)
7. SES (Score: 2)
8. DES (Score: 0)
9. SES-HW (Score: 0)
10. DES-HW (Score: 0)
11. TES-HW (Score: 0)
12. ARIMA (Score: 0)

El siguiente código implementa la prueba de Diebold-Mariano para comparar la precisión de dos modelos de pronóstico para los residuos obtenidos en los modelos en el conjunto de Test. A continuación, se resume lo más relevante:

Prueba de Diebold-Mariano:

    La función diebold_mariano_test compara dos modelos de pronóstico utilizando una función de pérdida (error cuadrático o error absoluto).

    Calcula el estadístico DM y el valor p para determinar si hay una diferencia significativa en la precisión de los pronósticos.

Comparación de modelos:

    Se itera sobre todos los pares únicos de modelos almacenados en ResTableVal.

    Para cada par de modelos, se obtienen los residuos (errores de pronóstico) y se realiza la prueba de Diebold-Mariano.

Resultados:

    Los resultados de la prueba (estadístico DM y valor p) se almacenan en un DataFrame (ResDM).

    Finalmente, se imprime el DataFrame con los resultados de todas las comparaciones.

En resumen, el código compara la precisión de diferentes modelos de pronóstico utilizando la prueba de Diebold-Mariano, almacena los resultados en un DataFrame y los imprime para su análisis

In [72]:
# JDF - 2025Mar11 - Diebold-Mariano - Test

import pandas as pd
import numpy as np
from scipy.stats import norm

def diebold_mariano_test(actual, forecast1, forecast2, loss_function='squared'):
    """
    Performs the Diebold-Mariano test to compare two forecasts.

    Args:
        actual (np.array): Array of actual values.
        forecast1 (np.array): Array of forecast values from model 1.
        forecast2 (np.array): Array of forecast values from model 2.
        loss_function (str): 'squared' for squared error, 'absolute' for absolute error.

    Returns:
        float: DM statistic.
        float: p-value.
    """

    if len(actual) != len(forecast1) or len(actual) != len(forecast2):
        raise ValueError("Arrays must have the same length.")

    if loss_function == 'squared':
        d = (actual - forecast1)**2 - (actual - forecast2)**2
    elif loss_function == 'absolute':
        d = np.abs(actual - forecast1) - np.abs(actual - forecast2)
    else:
        raise ValueError("Invalid loss function. Choose 'squared' or 'absolute'.")

    mean_d = np.mean(d)
    var_d = np.var(d, ddof=1)  # Use ddof=1 for sample variance
    n = len(d)

    if var_d == 0: #handle the case where all loss differentials are identical.
        return 0,1.0 #no difference, p value 1.0

    dm_stat = mean_d / np.sqrt(var_d / n)
    p_value = 2 * (1 - norm.cdf(abs(dm_stat)))

    return dm_stat, p_value

# Initialize an empty dataframe to store DM test results
ResDM = pd.DataFrame(columns=['ModelName1', 'ModelName2', 'DMStatistic', 'DMpValue'])

# Get the list of unique models
models = ResTableTest['ModelName'].unique()

# Create a zero array for actuals (since residuals are used as forecasts)
#actual_values = np.zeros(len(ResTableVal['ResValues'].iloc[0]))
#len(ResTableVal['ResValues'])

# Iterate over all unique pairs of models
for i in range(len(models)):
    for j in range(i + 1, len(models)):
        model1 = models[i]
        model2 = models[j]

        # Get residuals for the two models
        res1 = ResTableTest[ResTableTest['ModelName'] == model1]['ResValues'].iloc[0]
        res2 = ResTableTest[ResTableTest['ModelName'] == model2]['ResValues'].iloc[0]
        actual_values = np.zeros(len(res1))

        # Perform Diebold-Mariano test
        dm_result = diebold_mariano_test(actual_values, res1, res2, loss_function='squared')

        # Extract DM statistic and p-value
        dm_statistic = dm_result[0]
        dm_pvalue = dm_result[1]

        # Append the result to the ResDM dataframe using pd.concat
        new_row = pd.DataFrame({
            'ModelName1': [model1],
            'ModelName2': [model2],
            'DMStatistic': [dm_statistic],
            'DMpValue': [dm_pvalue]
        })
        ResDM = pd.concat([ResDM, new_row], ignore_index=True)

# Print the resulting dataframe
print(ResDM)
       ModelName1 ModelName2  DMStatistic  DMpValue
0             SES        DES    -0.938372  0.348053
1             SES     SES-HW    -2.966626  0.003011
2             SES     DES-HW    -1.296910  0.194662
3             SES     TES-HW     0.871441  0.383514
4             SES        RNN     2.307004  0.021055
..            ...        ...          ...       ...
61          ARIMA        SVR     3.859046  0.000114
62          ARIMA    SVR-AFS     3.899375  0.000096
63  ARIMA-Rolling        SVR    -0.260840  0.794216
64  ARIMA-Rolling    SVR-AFS     0.703635  0.481660
65            SVR    SVR-AFS     0.797416  0.425210

[66 rows x 4 columns]

El código realiza un ranking de modelos basado en los resultados de la prueba de Diebold-Mariano para los residuos obtenidos en los modelos en el conjunto de Test. A continuación, se resume lo más relevante:

Evaluación de modelos:

    Utiliza los resultados almacenados en ResDM, que contiene los estadísticos DM y los valores p de las comparaciones entre pares de modelos.

    Asigna puntajes a cada modelo según si supera significativamente a otro en la prueba de Diebold-Mariano.

Criterio de significancia:

    Si el valor p es menor que el nivel de significancia (0.05), se determina que hay una diferencia significativa entre los modelos.

    El modelo con mejor desempeño (según el estadístico DM) recibe un punto adicional en su puntaje.

Ranking de modelos:

    Los modelos se ordenan de mayor a menor según su puntaje acumulado.

    Se imprime el ranking de modelos, indicando su posición y puntaje.

En resumen, el código clasifica los modelos según su desempeño relativo en la prueba de Diebold-Mariano, generando un ranking de mejor a peor modelo basado en su puntaje acumulado.

In [73]:
# JDF - 2025Mar11 - Diebold-Mariano - Test

import pandas as pd

# Initialize a dictionary to store model scores
model_scores = {}

# Significance level
significance_level = 0.05

# Iterate through each row in ResDM
for index, row in ResDM.iterrows():
    model1 = row['ModelName1']
    model2 = row['ModelName2']
    dm_statistic = row['DMStatistic']
    dm_pvalue = row['DMpValue']

    # Initialize scores for models if not already present
    if model1 not in model_scores:
        model_scores[model1] = 0
    if model2 not in model_scores:
        model_scores[model2] = 0

    # Update scores based on DM results
    if dm_pvalue < significance_level:  # Statistically significant difference
        if dm_statistic < 0:  # Model1 is better
            model_scores[model1] += 1
        else:  # Model2 is better
            model_scores[model2] += 1

# Sort models by their scores in descending order
ranked_models = sorted(model_scores.keys(), key=lambda x: model_scores[x], reverse=True)

# Print the ranking
print("Model Ranking (Best to Worst):")
for rank, model in enumerate(ranked_models, start=1):
    print(f"{rank}. {model} (Score: {model_scores[model]})")
Model Ranking (Best to Worst):
1. RNN (Score: 5)
2. MLP (Score: 5)
3. LSTM (Score: 5)
4. ARIMA-Rolling (Score: 5)
5. SVR (Score: 5)
6. SVR-AFS (Score: 5)
7. TES-HW (Score: 3)
8. SES (Score: 2)
9. DES (Score: 1)
10. SES-HW (Score: 1)
11. DES-HW (Score: 1)
12. ARIMA (Score: 0)